Most often we have to categorize data in the form of a hierarchy. A nice example would be an e commerce application where we have categories under categories and finally products. In terms of data structure it’s a tree structure with products as it’s leaf nodes. Trees are a special case of graphs.
However when we have only relational databases at our disposal the question arises as how to store a tree in the form of a table. This problem is well known and there are apparently two well known solutions. One is storing these trees as ‘Adjacency Lists’ and second one is ‘Nested Sets’. Adjacency list is really simple and easier to implement as well, but like most other easy things it is inefficient when it comes to handling large and deep trees. Going through most of the open source php softwares like phpBB I realized that they too follow this model.
Nested sets is a very interesting idea and it is more suited for relational databases. A nice simple tutorial is given at http://dev.mysql.com/tech-resources/articles/hierarchical-data.html. Apparently both these approaches solve our problem of storing tree structures in a database. In fact a total book has been written on the topic of how hierarchical information can be stored in relational databases. Interested people may see http://www.openwin.org/mike/reviews/books/joe-celkos-trees-and-hierarchies-in-sql-for-smarties/
If we consider the Web 2.0 sites they heavily use hierarchies. Once we develop a tree its a fixed structure, we can not easily re categorize it. For example consider a simple example:
We have a shopping site that categories products on their type e.g Consumer Electronics etc etc and under these products are categorized by brand names. What if a particular user tomorrow wants the opposite? If he wants the products to be first categorized by brand names and under it by type of the product? How can that be achieved? Certainly it can be achieved by writing a bit more complex queries and some recursive functions. But if the tree is deep to say 10 levels then the possible number of permutations increase heavily. (more details http://www.mcs.vuw.ac.nz/courses/COMP103/2006T3/lectures/32-graphs.pdf)
Information processing getting so demanding these days that soon we will find ourselves in such a situation. Where in each node will be independent of others and bear only some relationship with others. We might have to represent those relationships in many different ways forcing us to create many trees on the fly.
How many trees can be formed out of n nodes??? (http://www.cis.udel.edu/~breech/contest.inet.fall.04/problems/catalan.html)
The solution is simple. Its given by Catalan number given by the formula C(n) = (2n)! / [ (n+1)!n! ]. One day we might have to generate one of these combinations and present, and for that the kind of techniques we have mentioned above will not be suited. We need to find some other way to represent trees in relational database. I am after this problem for quite a long time and thought some of you might give me valuable suggestions.
May be some have already encountered these kind of problems or may be some of you can prove that such a problem would never exist. Or may be it has some trivial solutions. Some very difficult problems when perceived in a totally different perspective seem very simple solution.
Most of the people argue that there cant be a situation where in we might have to generate many combinations of a tree based on relation between nodes. I found one small example where in we might have to do so.
(()()(()))) where each opening and closing bracket might be an expression to be calculated. But if for optimization or something we might have to rearrange these expressions and represent them in a separate format. The problem finally reduces to re arranging a tree based on some rules (mostly what kind of operators we use in those expressions, which are commutative or associative etc etc. )