1. What is Data-Blitz
Data-Blitz is a processing platform that provides relief to entrepreneurs and organizations alike, that
require extremely high application service levels (i.e. through-put and availability) , and lack the
personal and resources to achieve this.. The Data-Blitz platform uses and exposes, modern
processing techniques in a consistent way. Data-Blitz separates application composition from
implementation. Data-Blitz provides a language neutral application processing environment, that
supports security, stability and simplicity at scale.
Data-Blitz utilizes modern processing techniques and components pioneered by such companies as
LinkedIn, Twitter, and a like. Data-Blitz is a framework, that allows organizations to operationalize what
is termed, “Big Data”. Data-Blitz operationalizes the complete life-cycle of Data-Blitz(“Big Data”)
applications, including building, testing, patching, versioning, securing, hardening and deploying
applications; to bare-metal or virtualization using, mainstream virtual cloud vendors, or onsite data
centers. Data-Blitz driven applications, are capable of scaling to millions of concurrent users,
providing real-time queries over Peta Bytes of data, and processing complex logic, in under 10
milliseconds. Data-Blitz is completely instrumented with dev/ops, elastic clustering, cloud security,
runtime dashboard monitoring and other useful utilities, right out of the box.
1
Content Routing
Eventually
Consistent
Real-Time
Data-Blitz Processing Pattern
Real-Time Real-Time
Compute Real-Time Data Storage Long Term Data Storage
2. Data-Blitz approach
Development
Developing “Big Data” driven applications can be daunting, requiring development organizations to
retool their processes and skill sets. Data Blitz offers a set of predefined processing patterns and tools
which provide developers with all the resources needed to be highly productive in a short amount a
time. Data-Blitz also allows the developer to choose from a wide variety of implementation
languages. Currently the Data-Blitz platform supports Java /JVM languages, such as Scala and
Closure, as well as Node.js. Go, C/C++, Ruby and Python.
Data-Blitz Services
Data-Blitz takes a platform as a
service approach. The Data-Blitz
platform offers developers and
architects a set of component
models, which can be molded to
represent any application. The
Data-Blitz stack is a set of core
services.
All Data-Blitz core services are
represented in separate clusters.
So effectively, the Data-Blitz
platform, is a cluster of clusters.
The Data-Blitz workbench, called
InfiniteEleven, distributes
application implementations
across the core services. The
Data-Blitz core service clusters are
built using the best-of-breed,
open-source components. Data-
Blitz natively supports the following open-source components as core services.
2
Data Blitz Service Clusters
kafka cluster
Messaging
content router
cluster
Content Routing
storm cluster
Processing
hadoop cluster
Long Term
Database
couchbase cluster
Real-Time Database
zookeeper cluster
Coordination
Dev/OpsApplications
3. Architectural Schemes
Application Channel
Application Channel is a stream processing pattern. Clients connect to Data-Blitz through a route. The
route connects a messaging source, to a messaging sink. The Application Channel that connects the
route to a processing executor. The processing executor comes in many flavors.
Data Normalizer
The Data Normalizer accepts unstructured data and semi-structured data, then translates the data
into a structured format. This is particularly useful when trying to integrate relational databases with a
unstructured or differently structured data sources.
Data Distributor
The Data Distributor receives streams of data and distributes the data across the enterprise.
Sometimes the data needs to be normalized differently across the different enterprise data endpoints.
Log Sink
The Log Sink processing pattern excepts data in multiple formats, from multiple sources. The Log
Sink then distributes the incoming log data across third party analytic providers, potentially translating
content differently for each.
External
Facing
Interfaces
Messaging Processing Cluster
Coordinate
Real-Time
Databases
Long Term
Storage
Databases
Application
Languages
Dev/ops
HTTP/
HTTPS
Kafka Apache
Storm
Apache
Zookeeper
Couchbase Hadoop Java Puppet
Asyc/Syc Kestral Apache
Trident
MongoDB Yarn Node.js Chef.io
Restful ZeroMQ Command CouchDB Scala
Binary Custom Cassandra C/C++
JSON MySql Go
Batch MS Sql
Server
Closure
XML Oracle Python
JMS VoltDB Ruby
3
4. Modeling
Creating Applications that are composites of many different run-time components can be daunting
and error prone. Data-Blitz provides a unified model, which abstracts away the finer details of each
component into a combined Domain Specific Language (DSL). In order to create and deploy a Data-
Blitz application requires a single configuration artifact (i.e. a JSON file). The Data-Blitz Deployer,
Infinite 11, accepts the Data-Blitz DSL file and provisions all composite run-times across all of the
Data-Blitz clusters. The Data-Blitz processing framework along with its workbench was designed to
create and deploy applications within 2 to 4 weeks, opposed to 6 months to a year. Data-Blitz does
this with Infinite 11, a rich graphical driven component repository that exploits reuse at amazing levels.
Data-Blitz also supports mainstream IDEs such as (Eclipse, IntelliJ, NetBeans and WebStorm) through
plug-ins. The Data-Blitz IDE plugins perform all the ancillary plumbing and repository interactions
needed to create and test new Data-Blitz components within the Data-Blitz DSL.
Security
With great power comes great responsibility. Data-Blitz offers a “Big Data” security model that
guarantees application security across the entire Data-Blitz stack; deployed across an arbitrary set of
computers. Security is broken down into two policies, moving data and data at rest. Data-Blitz
security can exist separately or be integrated with existing enterprise security offerings. Data-Blitz
security is deployed and specified through a set of pre-made Dev/Ops recipes (i.e Chef.io and
Puppet).
DevOps/Deployment
Data-Blitz architecture and component organization was designed with DevOps in mind. Each
component in the Data-Blitz DSL can be specified and configured via Chef.io and/or Puppet (i.e. the 2
most popular DevOps platforms). Data-Blitz deployment is completely instrumented using DevOps.
This implies that the entire Data-Blitz deployment stack, including all its separate run-time
components (i.e. Messaging, Compute, Coordination and Persistence) are automatically installed,
configured, tested and started, without requiring specific knowledge of install techniques and
configuration options of each runtime component. The Data-Blitz DevOps scripts can also be
modified allowing specific tuning of each runtime component. The out-of-the-box Data-Blitz DevOps
default settings satisfy 90% of most deployments.
4
5. Training and Culture Buy-in
Many software organizations have cultures that are the result of years of experience and lessons
learned using legacy techniques. Moving to a modern “Big Data” processing platform can be
intimidating and confusing. Many individuals wrestle with new concepts, which can appear to render
their existing methods of software development obsolete. The natural response is the fear of losing
their jobs, which results in friction in moving to something new. Data-Blitz offers a set of focused
training on culture buy-in. It provides all the resources needed for individuals to have a clear path to
success with “Big Data” using Data-Blitz. Data-Blitz training also provides clear unambiguous content
to understanding Data-Blitz processing. This includes specific training on each component in detail
(i.e. Messaging, Compute, Coordination and Persistence). This along with detailed documentation
related to the Data-Blitz DSL options, results in a speedy knowledge transfer, which instills
confidence, knowledge and acceptance of Data-Blitz processing and “Big Data” in general. We at
Data-Blitz understand that people are your most prized resource and culture is everything.
Processing
The Data-Blitz processing model uses the “best of breed” open-source processing components at its
core. Using the Data-Blitz DSL abstracts away all the gory details using intelligent default
configuration. Each open-source component provides unprecedented value within their particular
vertical space, but separately provides little commercial value. Data-Blitz provides the glue and
plumbing to orchestrate these separate components into an end-to-end cohesive solution. Data-Blitz
implements a version of a “Lambda” real time processing architecture (see http://lambda-
architecture.net) which Twitter has pioneered over the last few years and open sourced in 2011 (see
https://storm.apache.org). A single Data-Blitz deployment is a cluster of clustered components. Each
component cluster offers separate service level functionalities and offers high-availability through
redundancy, and high-throughput through shared workload distribution. Each Data-Blitz cluster is
broken down into clusters representing Content Routing, Messaging, Compute, Real-Time Data
Storage, Long Term Data Storage, Application Management/Coordination and Security Enforcement.
The Data-Blitz DSL handles provisioning and configuring all of the Data-Blitz clusters from a single
artifact. The Data-Blitz Deployer, Infinite 11 handles interpreting the Data-Blitz DSL and provisioning
each component in each cluster separately. This approach to delegated DSL interpretation, lends
itself well to custom DSL's focused on client specific functionality, using nouns and verbs found in the
client’s specific domain. Each Data-Blitz cluster runs as a separate entity, each maintaining its own set
of resources and namespaces. Data-Blitz clusters can be easily set up to securely replicate to other
Data-Blitz clusters. Data-Blitz replication offers 2 different transport protocols; one for dealing with
Data-Blitz clusters that share the same data center, and one for Data-Blitz clusters deployed in
different geographic data centers. This implies that one Data-Blitz cluster can accept information
from an outside source, say the public Internet and perform a set of general analytics and replicate
information automatically to other Data-Blitz clusters, focused on specific functionality related to a
subset of the clients domain. For example in the medical space, one Data-Blitz cluster could handle
the ingestion of medical claims from hospitals and medical clinics providing real-time alarming and
fraud detection analytics, while simultaneously replicating messaging content to remote Data-Blitz
5
6. cluster studying diabetes and another Data-Blitz cluster studying rheumatoid arthritis, and yet another
Data-Blitz cluster to actually process the claim. All this information distribution is handled
automatically in real-time without writing any code (i.e. filling out a form on the Data-Blitz Workbench).
Other systems achieve this functionality with copious amounts a fragile architecture, requiring
numerous assumptions and constraints.
Why Data-Blitz?
Is this you?
The way we did it in the past, will not work anymore. There is just too much data, and too much
processing. “Big Data” usage in your domain is game changing. Your competitor may already have
some. You have a large investment in the old way of doing things. You have been waiting for the last
two years, for the right time, to risk embracing the future, but the risk of failure is killing you and your
boss. The future is now. Real-time, big-data-level processing is changing the world. Data-Blitz
provides a common sense approach to building modern applications. We empower your software
engineers with all the tools, knowledge and plumbing needed, to realize arbitrarily scaled
applications, in extremely short sprints.
Or is this you?
You are an entrepreneur with a great idea. If you are successful as you can be, your application will
have to scale to Facebook like levels. Your first deployment of your application has a limited budget.
You would like an application architecture, that is flexible enough to deploy to budget level compute
resources or deploy to enterprise level commodity compute resources, without changing the
application domain logic. Also your investors would like to see application demos every week, and
they expect the application to be deployed into production within three weeks..
Data-Blitz makes senses on many levels
• Helps organizations grow and benefit from
“Big Data” in a very short amount of time
• Avoids vendor lock-in
• Greatly reduces time to market
• Greatly reduces cost to maintain
• Optimizes hardware usage
• Lower cloud vendor costs
• Provides a safe and functionally rich starting
point to build amazing applications.
• Provides a graphical workbench that
supports the entire application lifecycle.
• Provides deployment functionality for
applications that logically span multiple
runtimes.
• Easily integrates with mobile platforms and
legacy applications.
At Data-Blitz we understand that humans use the Data-Blitz processing platform. Individuals who use
Data-Blitz, need a means to safely grow, while meeting application service and schedule
expectations. So whether you're a large company, with tight schedules looking to move into modern
processing, or a start up with an idea requiring Facebook level Scalability, Security, Availability and
Performance, Data-Blitz is worth looking at.
6
7. Licensing options
Data-Blitz offers licensing options that are tailored toward different organizations and their specific
needs and levels of sophistication. Many organizations maintain an IT development organization,
which develops most of their internal software. Other organizations want a more turnkey solution and
would prefer a more outsourced approach. And finally, some organizations would like a completely
outsourced approach, where we at Data-Blitz completely develop, deploy and maintain their
application. Below are 3 popular Data-Blitz licensing options used in the past. Data-Blitz is also open
to any other type of licensing option that empowers our clients to be successful.
Option 1: you own the code
This option is attractive to companies who maintain their own software development organizations.
Generally these organizations lack the bandwidth and experience to operationalize “Big Data”. These
organizations have personnel who understand modern “Big Data” processing concepts, but lack real
experience implementing them. Frankly, they don't have time to research and learn how the “Big
Data” puzzle fits together. These organizations need a safe starting point to base their “Big Data”
processing driven applications on. The Data-Blitz implementation uses accepted “Big Data”
processing patterns and is well documented, completely testable and deployable through
mainstream Dev/Ops tools. This licensing option implies that there is no reoccurring revenue stream
to Data-Blitz in anyway. In other words, this option is analogous to having Data-Blitz software
engineers build Data-Blitz on your site, the only difference being, you have Data-Blitz functionality
now and have reduced the risk of the most risky aspect of your architecture.
Option 2: you license Data-Blitz
This option is attractive to companies who favor a more supportive approach. With this option you
license Data-Blitz on a one-year basis. During that year you are eligible for any Data-Blitz core related
upgrades and or patches. These License packages also include training and help desk support
components. Support has different levels based on the number of outstanding support tickets
allowed and the expected turnaround time. Also this approach allows companies to have direct input
to features in new versions of Data-Blitz and provides access to others in the Data-Blitz community.
The actual licensing fees are commensurate with the size of the Data-Blitz deployment
Option 3: we create, deploy and maintain your Data-Blitz driven
application
This option is attracted to companies who have extremely tight schedule expectations and wish to
outsource all aspects of the Data-Blitz application lifecycle. Data-Blitz software engineers work in
concert with your application stakeholders to partner with them in designing, developing, deploying
and maintaining your Data-Blitz driven application. Data-Blitz leverages their relationships with major
cloud vendors to provide an economical turnkey solution.
7