Rackspace Open Sources Atom Nuke, The Fast Atom FrameworkDocument Transcript
Rackspace Open Sources Atom Nuke, The FastAtom FrameworkFiled in Product & Development by Chad Lung | September 11, 2012 3:30 pmWhat if you had a tremendous mountain of data, broken up and stored across thousands of servers, and yourclient wanted some specific portion of that data? You could assemble the whole mountain and send the wholething to your client, leaving the client to pick out what’s needed. But there are reasons you split it up in thefirst place: it’s too big to store in one place or to transfer without interruption. Additionally there are reasonsyou manage the data, including security and privacy, so this mountain moving might not be a good idea. What if you could create something as complex as this, with data in multiple formats from multiple origins stored across multiple servers but aggregated for multiple consumers, who could then repackage it for consumers of their own?If you couldn’t give your client a copy of all your data, you could ask the client to describe the specific datathat’s needed and then assemble those items the client needs. However, if you had many clients, each withtheir own mountains of data, would you have to create a direct path from every consumer to every fragmentof data they need?What you need is to easily create a bridge, integrating any number of data origins with any number of dataconsumers. Enter in Atom Nuke.
 With Atom Nuke, no matter where your data originates and who consumes the data, it could be this simple to think about.Atom Nuke Simplifies IntegrationWe created Atom Nuke to give ourselves two kinds of power related to the high volumes of data producedby our Atom feeds. fission, making it easy to divide data in new ways fusion, making it easy to combine data in new ways  A six-way integration requires eighteen paths, connecting three data origins with three data consumers so each has direct and equal access. Adding one new origin or consumer requires adding many new paths.Atom Nuke is an open-source collection of utilities built on a simple, fast Atom implementation that aims fora footprint of minimal dependency. The Atom implementation has its own model and utilizes a SAX parser
and a StAX writer. SAX (Simple API for XML) makes it simple to read existing data StAX (Streaming API for XML) makes it simple to stream data to and from applications With Atom Nuke providing a bridge, a six-way integration requires six paths, one from each of the three origins and three clients, with each path terminating at Atom Nuke. Adding one new origin or consumer requires adding one new path.We designed our Nuke implementation for immutability, maximum simplicity and memory efficiency. Nukealso contains a polling event framework that can poll multiple sources. Each source may be registered with aconfigured polling interval that governs how often the source is polled during normal operation. That sourcemay have any number of Atom listeners added to its dispatch list. These listeners will begin receiving eventson the next scheduled poll.Atom as a Building BlockAtom is a self-discoverable and generic syndication protocol. The Internet Engineering Task Force (IETF)describes Atom in several ratified Requests for Comments (RFCs): the Atom RFC the Atom Paging and Archiving RFC the Atom Publishing Protocol RFCThe unique properties of the Atom specification have made it popular as a protocol for generic eventdistribution, syndication and aggregation. Using Atom as a common interchange format, event publishers addtheir domain-specific events to an Atom publication endpoint. Downstream, subscribers are notified of eventsthey’ve pre-identified as relevant, controlling what they consume from potentially-vast collections ofpublished data.Atom Nuke Within Rackspace
Within Rackspace, the Cloud Integration team builds tools for all our software development teams to use. Weneed to provide high-quality tools but we also need them to be easy to use and work smoothly together so thatwe can encourage adoption throughout Rackspace.Using Atom Nuke, we collect data from the Atom feeds supplied by Atom Hopper, another of our open-source tools. We then take that Atom data and feed it into several systems, including those that performanalytics on OpenStack deployments throughout our data centers. The analytics engine uses Nuke tocollect the entire Atom feed data so it can be marshalled into a Hadoop cluster. By combining our AtomNuke and Atom Hopper tools, we’ve enabled complete portability of data: we can combine Atom events withdata from other sources such as Rabbit MQ messages and Flume logs without requiring consumers ofthat data to deal with the complexities of interacting with those dissimilar sources.Nuke Makes Working with Atom EasyAtom Nuke excels as a an Atom feed crawler, since you can poll multiple feeds from multiple endpoints aswell as define the polling intervals down to milliseconds. In addition, you can select events in response tospecific triggers, such as when a specific Atom entry contains a subscribed category. However, Nuke is muchmore than a feed crawler, it can create its own Atom feeds if needed.We built Atom Nuke with Java but we recently extended support to Python. Nuke is licensed underthe Apache 2 license and was created by John Hopper, a software engineer on the Rackspace CloudIntegration team. We’ve created some tutorials to get developers started with Nuke.Building with Boxes, Not BricksWriting about a different kind of atom in a world that was just beginning to understand atomic structure andatomic energy, H.G. Wells (1866-1946) imagined a future in which using the power stored within atomstransformed many aspects of human life: “I feel that we are but beginning the list. And we know now that the atom, that once we thought hard and impenetrable, and indivisible and final and–lifeless–lifeless, is really a reservoir of immense energy. That is the most wonderful thing about all this work. A little while ago we though of the atoms as we thought of bricks, as solid building material, as substantial matter, as unit masses of lifeless stuff, and behold! these bricks are boxes, treasure boxes, boxes full of the intensest force.” —H.G. Wells, The World Set Free, 1914We’re now at a similar point with the technology of our time. We have explored enabling technologies, suchas Atom, and have begun fully using and building upon their capabilities, putting them to work in new waysto make new things possible. As we begin building with Atom Nuke, we’re using Atom not as a brick, but asa treasure box, containing amazing possibilities for fission and fusion, dividing and combining data to makenew applications possible. By making Atom Nuke and some of our other projects such as Atom Hopperavailable as open source, we hope we are also creating treasure boxes filled with ideas and possibilities.To learn more about Atom Nuke, visit our project site and check out the source code on GitHub.Endnotes: 1. [Image]: http://ddf912383141a8d7bbe4- e053e711fc85de3290f121ef0f0e3a1f.r87.cf1.rackcdn.com/atom-nuke-inall-outall.png 2. Atom Nuke: http://atomnuke.org/