It’s one thing to support many data sources with megabytes of data. It’s a completely different problem supporting thousands of data sources with terabytes of data every day. How do you create systems that scale infinitely?
The answer is; you don’t . You can not design for infinite scalability. Rather, consider a pod approach where each pod supports a defined capacity. Scalability results from deployment of multiple cooperating pods.
Systems handling extremely large data sources with significant processing requirements are difficult at best to validate. Attempting to deploy such a system without well understood capacity limits is destined for failure.
This was first presented at Cloud Expo NYC.