Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014


Published on

The US government has built hundreds of applications that must be refactored to task advantage of modern distributed systems. This session discusses EzBake, an open-source, secure big data platform deployed on top of Amazon EC2 and using Amazon S3 and Amazon RDS. This solution has helped speed the US government to the cloud and make big data easy. Furthermore this session discusses critical architecture design decisions through the creation of the platform in order to add additional security, leverage future AWS offerings, and cut total operations and maintenance costs.
Sponsored by CSC

Published in: Technology

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014

  1. 1. November 13, 2014 I Las Vegas Matt Carroll, CTO, Defense & Intelligence CSC
  2. 2. The problem Over 400+ apps within its enterprise Over 1000+ active data sources consuming data on the order of TBs daily Network supports over 230,000 daily users with mission and business needs Apps Data Users Network Security Multiple networks deployed worldwide on multiple continents Every capability runs through a lengthy certification and accreditation process (4–6 mo) Disparate activities across apps and data have left little quantitative data We faced a highly complex environment for a US Government customer that had a large dependency on legacy systems with a need to modernize quickly Metrics
  3. 3. Customer challenges Budget •Not enough money to transition every app to take advantage of Big Data or a distributed system •Outsourcing IaaSneeds to be monitored for accounting, security, scale, etcwithout complex software •Application elasticity is critical to understanding the true costs of operations and maintenance •Storage (data) is a much bigger cost than expected •Need to consolidate systems engineering support While we faced many challenges it became clear early on that budgetand ease of integration for appsmust be our two driving forces App migration is not simple •Most apps are CRUD based; write a report, find a report •Security business logic is baked into each app •Number one question: Why can’t I choose the technology that best fits my app? •Cannot disrupt operations by any means! •Applications must reside on multiple networks and work together •Takes too long to get started, laying down databases, web tiers, etc Security is the ultimate killer of time The process around security became complicated, burdensome and still insufficient to counter threats at scale
  4. 4. Our mission Our Missionis to facilitate Big Data analytics across the enterprise by providing the tools necessary to align the work of the application engineer, analytic developer, and data scientist —freeing them tofocus on end products, not infrastructure;we provide this through EzBake Big Data should be easy Big Data should drive insight Big Data should be ubiquitous Big Data should be secure
  5. 5. EzBake It’s all about making application transition easier!!! Rather than assembling your own big data stack, EzBakeprovides an integrated way to compose the different elements of your application: collecting, processing, storing, and querying data Ease of application development •Time to market of apps and reuse •Autodeploymentand high-availability scaling •Integrated analytics and audit trails for logs, metrics, data access, and security events Built-in security layer •Role-based access and complex policies •Down to the object / cell-level controls •Encryption in transit Data layer •Ubiquitous data access (no stovepipes!) •Simplified streaming / batch analytics •Tailorableand technology agnostic •Abstracted index patterns Data layer Custom applications Physical databases MongoDBAccumulo PostgreSQL(RDS) RedishBase ElasticsearchTitan +Custom Execution layer Stream Batch Query Events +More Security layer
  6. 6. Key features Scaled and commonly used thrift services, typically used during streaming ingest Interface for building data flow topologies which abstract physical stream processors Both direct access to indices and aggregate query across the various data sets Indexing patterns exposed as thrift services and abstracts the physical database Amazon Elastic MapReduce (EMR) abstractions that enable complex, multidimensional discovery Both at the data persistence and user access layers Automated elasticity through a GUI-based deployment Streaming ingest (Frack) Common services Data persistence Distributed query Security Batch analytics Deployment
  7. 7. Technology agnostic •Instead of a jack-of-all-trades indexing for free text search, geospatial search, etc, use mission-specific indices for specific application logic needs •Focus on storage patterns vice database specific operations, thereby enforcing data access standards across the enterprise •Allow for new cartridges for web frameworks including Node.js, Python, Ruby, etc. Each app has its own needs, and it is not on the platform builder to force the team into a particular technology, rather offer a solution to meet the use case
  8. 8. Easy to deploy and secure The platform provisions and scales, like classic PaaS, and embeds data layer connections and security on Amazon EC2 •Developers pull-down sandbox from the collaboration environment to develop on their local box •App / service is output as a WAR and YML file (buildpack) •The app registration page allows engineers to deploy and register apps, data feeds, and services on the platform •EzDeployersupports dynamic resource management to all capabilities hosted and provisions through Amazon Elastic Compute Cloud (EC2)
  9. 9. App registration •Applications carry role-based access controls with human inserted deployment authorization •Registration to include data feeds, services, batch jobs, and intents. •Ability to assign other users as admin controllers through AWS Identity and Access Management (IAM) controls or other IdAM •Cuts down time to deploy and removes the need for app developers to write Puppet scripts •Build in account management policies for financial tracking of PaaSand IaaScosts Deploy with buildpackssecurely through the application registration page and provide elasticity as a service by abstracting Amazon EC2 services
  10. 10. Lab76: Collaborative development •Speed start of development from weeks to hours by enabling a truly agile development environment •GitLabwas exposed for source control and promoting the sharing of code across the enterprise through governance and oversight •Customized RedMinewas exposed for task management and to allow task oversight and alignment •DevOpscould clone an Amazon Virtual Private Cloud and stand up new environments in a day vs. months of setting up for each app or system The key to speeding transition was to remove redundancy; by providing a one-stop shop for devtools (Git, RedMine, Jenkins), a means to share code and common development environments, we gained months back from each development team
  11. 11. Leveraging a data layer on SQL and NoSQL the platform abstracts physical data stores and promotes storage patterns to enable ease of sharing, force object-level security, and provide the ability to plug and play databases Breaking-down disparate data stores •That’s not to say we implement Big Data SQL •Instead, we have the model that binds app development, BigData, and security •Focus developers towards database abstractions extensible toany database So what? •Move to production with Big Data without impacting existing SQL based production architecture (think PostgreSQL to RDS) •Brings data together across the enterprise helping customers with disparate engineering teams build to a standard
  12. 12. Distributed query We distribute object-specific queries across disparate data sets exposed through the data layer while controlling access through the service and at the data level •Migrate off-legacy data stores without disrupting production instances •Focus on object-based queries across many data sets as well as across Amazon VPC within an environment •Work with Clouderato modify Impala to run against multiple data stores •Common access controls across multiple data sets So what? •Common method to discover data across many apps, great for BI tools and third-party apps like Palantir, Tableau, etc. •Decreases the duplication of storage across the enterprise through common indexing patterns
  13. 13. Security becomes an API •All data is encrypted in transit •All transactions are authorized by the security service •All data is secured at the object level •Robust security service —scales horizontally and generated authorization tokens base on external IdAMproperties •Internal group management service scales to trillions of groups and beyond •Compressed bitvectorrepresentation of data visibility and access authorizations speeds security computations Following several zero-day attacks the enterprise is waking up to security but has no understanding of how to secure their Big Data platforms —a major reason many are not in production Bob Bob has authorizations: X, Y, and Z Data Data is tagged as: X, Y, and R. Sorry Bob! Only X and Y for you! Query Object-level security across all data stores through a common API will provide dramatic efficiencies as it decreases time to model data across multiple data stores
  14. 14. Metering and monitoring •JavascriptAPI for web apps, Thrift API for services, and REST for others •Improve application usability and usefulness by examining analytics on usage patterns •Diagnose issues with system, services, and apps •Determine cost allocation based on what agencies and organizations are using the system To bring back focus on understanding the environment we needed the platform to provide a comprehensive visualization to monitor users, data and services on AWS
  15. 15. Batch (Amino) •Removes complexity of Amazon EMR for the average engineer •Crowd source microanalyticsthrough analysts and engineers •Data agnostic •Not a black box •Fully scalable •Inherent cross-data source linked indexes •Encourages sharing of knowledge, discovery •Index built to support machine learning •Security considered up front —index is in Accumulo •Utilized AWS to enable rapid load-balancing to support demand based on data and usage Developers can write Amazon Elastic MapReduce(EMR) code to analyze data, but don’t know what to look for; the analysts know what to look for, but don’t know how to write code. Technology is not the problem.It’s enabling the analyst to effectively leverage technology and reuse it.
  16. 16. The impact So What? What were the overall accomplishments to date? Well… Time: The platform and the development model decreased the development time from 6–8 months to production to 3–4 weeks. Lean and Mean: Application teams went from being heavy on DevOps, security, testing to smaller, more agile teams focused on specific-mission use cases Most importantly… We revectoredteams back to their users, providing more capabilities in less time, thereby saving lives and protecting our country Data Shared: Legacy REST/SOAP interfaces have begun to die and time spent on sharing data is down significantly without impacting operations and more apps have more access to data Money: Removal of redundant code and system, faster app deployment, cuts in total storage costs, and decrease in team sizes led to a significant cost savings up front for the customer
  17. 17.