Tuesday, May 14, 2013

Realtime Big Data

In a post about architecting realtime big data systems, James Kinley splits things up into 3 layers: batch, serving and speed.



The architecture has merit even without the specific technologies he employs at each layer. Especially useful is the combining of batch and real-time views to a best-of-both-worlds query capability. Without an approach like this, a decision has to be made to either commit too soon or too late to what aggregation of data will be used.

It is a speed vs. flexibility problem. Commit too soon, and you will have a fast system that doesn't give you much query flexibility. Commit too late and you will have a flexible, slow system.

Kinley's architecture shows that you can have flexibility without giving up speed.