Selecting proper data structures for the realtime problems

26 Sep

As a software engineer you have to have a proper understanding about data structures and the corresponding implementations. In real world application development you need to think about the efficiency of the operations and it should fit for the entire domain that the solution is addressing. So in this blog post Im going to explain some of the data structures implemented in java.

Look at the following class hierarchy.


1. LinkedList –  Use the linked list If you need to maintain the list which is no need random access to the values and needed frequent insertions and deletions.
Insertion and deletion is O(1) and access the (k)th element is O(n).
But remember LinkedList is not thread safe.

2. ArrayList –  Use the array list if you need the random access to the values frequently and same as you don’t need frequent insertion and remove operations.
Access the (k)th element is O(1) and insertion and deletion is O(n).
Same as LinkedList remember ArrayList is not thread safe.

3. Vector –  When you need to maintain thread safety over the ArrayList use the Vector. If you don’t need any synchronization on each and every operations you have to go for ArrayList otherwise it will give bad performance impact on each operation.

4. PriorityQueue –  We know that queue is a first come first serve basis but some times we need to get the things according to the priority. So if you have that kind of problem use the priority queue. But remember PriorityQueue is not thread safe.

5. HashSet –  It is not maintaining the duplicates. When you need to maintain the unique list of objects you can use the HashSet. HashSet allows the NULL Objects but it is not maintain the insertion sequence. If you need to maintain the insertion sequence you have to use the LinkedHashSet.

6. TreeSet –  Same as HashSet this data structure maintain the duplicate free collection and additionally its provide the sorted order of the elements.TreeSet is not allow the NULL objects. Guarantees log(n) time cost for the basic operations add, remove and contains.

7. HashTable –  This data structure is useful when you need to do the insertion, deletion and the quick access to the given element in constant time. All operations are in O(1).Hash tables are not maintain the insertion sequence. I would like to explain the HashTable little bit in depth because this is commonly using in the industry. As an example a router table. When a packet has to be routed to a specific IP address, the router has to determine the best route by querying the router table in an efficient manner.

To use the Hash tables you have to implement the equals(), hashCode() functions the Object type that you are going to store in the HashTable.
hashCode() – as a best practice you have to consider all attributes in the object to generate the hash code. See the following example.

public class Employee {
    int        employeeId;
    String     name;
    Department dept;
    // other methods would be in here 
    public int hashCode() {
        int hash = 1;
        hash = hash * 17 + employeeId;
        hash = hash * 31 + name.hashCode();
        hash = hash * 13 + (dept == null ? 0 : dept.hashCode());
        return hash;

Once you call the insert method in the hash table it will calculate the hash value and then store the the data according to the hash value.

 int hashValue = hashCode % hash_table_size;

There are several ways to handle the collisions in hash table.
1. Separate Chaining – Maintain the list in each slot. If there are two objects with same hash value it is going to store in that list. This implementation has extra overhead to maintain the list. Total Performance is depend on the number of elements in the list.

2. Open Addressing – In this approach it is find the next available slot according to the probing function.
Linear prob – h(x), h(x)+1, h(x)+2, h(x)+3,..
Quadratic prob – h(x), h(x)+1^2, h(x)+2^2, h(x)+3^2,..
Double hashing – h(x), h(x)+1*h'(x), h(x)+2*h'(x), h(x)+3*h'(x),..

Delete elements in hash table – If hash table is use open addressing then it is logically delete the value from the table by setting a flag.

Performance of the hash table is denoted by alpha load factor.

 alpha = number_of_elements/table_size;

If hash table is filled we need to do the rehashing. Rehashing is done according to the load factor. If load factor reaches to some level we need to do the rehashing. This is the only disadvantage of the HashTable.

8. HashMap –  Hash map is used when you need to store the elements as key, value pairs.In hash map you cannot duplicate the key but you can duplicate the value with different key. This is not maintain the insertion sequence. If you need to maintain the insertion sequence you have to use the LinkedHashMap. Remember HasMap is not thread safe if you need thread safety over HashMap you have to use the ConcurrentHashMap

8. TreeMap –  If you need to maintain the key, value pairs in sorted order this is the data structure suits for that operation. Its guarantees that the given key value pairs arrange in sorted order according to the compareTo() method in Object type which is stored in TreeMap.

Leave a comment

Posted by on September 26, 2014 in data-structures, java, Other


Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: