Reuse of the concept and design is very popular in software design. In the same way, a good reuse of the processed data in data analytics is very important concept. It saves time and resources being used for processing. By information stack I mean a stacked form of processed data or an analytical platform feeding various products/portals at various stages. If we really want to build an analytical platform that feeds multitude of products at various stages, then wise decision would be to start our own solution for information stack on top of some bare bone distributed systems.
For this purpose, required is a very robust processing engine that generates the information in stages and a robust distributed storage to provide the back end for the processing engine. With information stack concept,various products providing various views of the actual data, we can form a stack of information in an array of product line. The various products require different data content and different format that the information stack provides. Various products need different access capability and latency. This varying needs forces us use stack of the distributed systems with one layer in the stack specialized for particular needs. To this end, distributed file systems, some distributed processing frameworks and finally the distributed online storage (database) forms a good combination.
Besides, the platform, one obvious requirement placed is the sound knowledge of the core of the data structure and the idea of the stages in which the data to be processed for feeding the staged products. If you have a sound understanding of your data and the information requirement of all the products at your hand, you are in the obvious stage of starting it. Also,Choosing right framework among available ones is very crucial. One of the most popular open source candidate is Hadoop/HBase combination. There are so many closed source vendors in the line. The choice between the open source and closed source again depends upon how smart you are at your own data staging and movement. Of course, closed source solutions can not provide you all the customization and flexible data structure for your information stack.
Information stack can be helpful for all data analytics ranging from offline reporting to online browsing. If well designed, an information stack build once can iteratively be expanded to form the good foundation for multitude of products. Google is one good example for building such very dynamic and robust stack, the very foundation of which is it’s search engine.