Big Data comes from a variety of sources as human activities online generate vast amounts of data every day through intentional, accidental, and unknown means. This includes activities on social media, sensors, logs, and more. Content delivery networks (CDNs) can help distribute big data by caching content on servers located closer to users. While pushing content to CDNs offloads work from origin servers and improves performance, it also segments users and requires replication strategies to maintain consistency. Techniques include pre-computing static content from dynamic sources, pushing searches and other functions to CDNs, and experimenting with different cache models. Overall, CDNs can be an effective way to distribute big data but also introduce more complexity and dependence on the CDN
89. Even when you change the prices you still can pre-compute the page at some time – you don’t need to compute the content while the page is getting accessed
93. And even when you offer “Web 2.0” features such as rating through customers, you can asynchronously recompute (parts of) the pages using the new rating information
94. Some book store content modifications are not very critical. They can be updated lagged on geographical base
95. You see: many parts of an online bookstore seem dynamic, but can be actually pre-computed and delivered (lagged) as static content in web terms
96. It’s all about the frequency of change, distances, wideness of distribution and the big data pain
100. Even this ultimately dynamic sounding feature can be (partially) de-dynamized. Consider the full text index as static content, not necessarily the data itself
102. Sure, you cannot pre-compute the shopping cart. But maybe you also don’t need to synchronize a German customer’s cart to the whole world and keep it “local” instead
103. Owning big data doesn’t necessarily mean owning 100% dynamic data in terms of web
120. CDN is like a deputy. You make a contract, and it takes over parts of your platform
121. From here, it delivers to your users the content you tell it to deliver, but being much closer to them and much more intelligent than you when it comes to managing the load
122. CDN has its infrastructure including actual nodes directly at the backbones, offering web caching, server load balancing, request routing and, based upon these techniques, content delivery services
123. What you have seen earlier: CDN’s DNS infrastructure has each time returned a different IP address with TTL = 20
124. This is done either through DNS cache “splitting” or dynamically based on the IP address of the origin name server which made the DNS A query
125.
126. What you now can expect is that the returned IP address leads you to a load balancer – your gate to a whole sub-infrastructure of the CDN which balances between, for example, web caches
127. last mile last mile cache refresh inter-cache replication cache access cache access 1.2.3.4 5.6.7.8 A query 10.2.3.40 50.6.7.80 10.2.3.40 50.6.7.80 your servers caches caches name server
128. CDN uses different algorithms to decide where it routes user requests to: based upon current load, cost, location etc.
129. But in the end, your content gets delivered to the user. If it expires, CDN refreshes it from your servers in the background
130. According to HTTP/1.1, a web cache has to be controlled by: Freshness Validation Invalidation
131. As the very last step, you might have to offer the “last mile” – the very last application access, e.g. the last item view or similar. Here, the user hits your server