Compute NodesSix racks, each containing 37 servers, for a total of 222 servers. Each server is a DL360 G6 model with:Processor :Two Quad Core X5570 2.93GHz Intel Xeon CPUs Memory configuration (varies) – 36GB to 144GB RAMStorage: Eight 500GB 6G SAS 7.2K 2.5in MDL Disk DrivesManagement NodesOne rack, each containing 6 management servers. Each server is a DL360 G6 model with:Processor : Two Quad Core X5570 2.93GHz Intel Xeon CPUs Memory: 144 GB RAM Storage: Two 146GB 6G SAS 15K Raid1 Disk DrivesSAN Disk Storage Array Enclosure Two dual controller EVA 4400 Storage Arrays, each with eighty-four (84) 1TB FATA Disk Drives, for a total of 168 TB (RAID5 equivalent).Internal network:SIPRnet connectivity: a dual-pair fiber cable to POD 2x Network Access Controller (NAC) at the edge of the DSC internal networkCore aggregation network - pair of Cisco Nexus 5020 10GbE switchesAll compute servers interconnected with three leaf switches Cisco 3750E.SAN is directly attached to all of the six Management Nodes using two 16 Port 1u Brocade Fibre Channel
Highlight cell level security
Tunable tradeoffs between consistency and latency, accommodating very high write throughputCloudbase focuses more on consistency – so has a slower write throughputNo server-side security
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics Corporation 2/12/2013
Cloud Overview Intro to Cloud Technologies: Agenda1. Intro to Data Tactics2. Cloud3. Data4. Hardware5. Example Cloud Solution
Data Tactics: Who Are We?• Established in 2005 – Created by a group of seasoned engineers, analysts and management specialists who – Have worked together for over thirty years – A Minority-Owned Small Business registered with CCR and ORCA – TS-Facility Clearance• Locations in McLean, VA & Aberdeen Proving Ground, MD, – Advanced Lab Facilities – Integrated Development, Test, Integration and Evaluation Facilities – Host six (6) clouds for the Army and DARPA – Demonstration Rooms – One Sponsored Certified and Accredited SCIFs• Prime Contract Vehicles – Army RDECOM BAA – Army I2WD BAA – GSA Alliant Small Business – Subcontracts with several LSI firms across DoD and IC• Certifications – ISO 9001 – Quality Management Systems (May 2010) – ISO 27000 – Information Management Security Systems (May 2010)
Data Tactics: What we do• Data Architecture – Innovation and Design – Assessment and Benchmarking – Collaboration and Uniformity• DataEngineering – Discovery, Ingestion, and Cleansing – Scientific Analysis Data Tactics Solutions Spectrum – Large Scale Computation and Platforms• DataManagement – Security and Assurance – Infrastructure and Administration – Visualization and Dissemination
Data Tactics: Our Family• Over One Hundred Fifty Employees – Leadership Team, Deeply Experienced, Very Successful, Rich in Relationships – 90% TS/SCI cleared, many with polygraph(s) – Employee retention near 90% • Steeped in and Dedicated to the Data Tactics Vision • High percentage of Military and Intelligence Community veterans – Personnel Certifications • ITIL V3 Foundation • PMI certified project managers • CISSP certified security managers • Cloudera Certified Engineers • 35% of Technical Staff • Software Certifications • Over 10% of Staff are “Data Scientists” • Three WORLD class semantic researchers • Java, Solaris, Linux, Microsoft, Oracle, VMware, IRIX • Hardware Certifications • Riverbed, EMC, SUN, Dell • Architecture • SOA, DoDAF, other Modeling – 25% have Advanced Degrees and Doctorates
Data Tactics: Cloud Experience•5 Clouds on SIPRNET •3 at our secure facility in Tyson’s •GISA, Ft. Bragg •Afghanistan•4 at TS/SCI •AF TENCAP on JWICS •NRL on JWICS •DARPA •INSCOM •DSC (pending)•Over a dozen at Unclassified/FOUO Level •Supporting real-world missions on contract •CX-I Cloud in Afghanistan •At various levels of complexity•Cloud Domains is where we live •Data, is the Hard Problem
NIST Definition• According to NIST: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.• Five essential characteristics of Cloud Computing: 1. Broad Network Access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms 2. Rapid Elasticity: Capabilities can be elastically provisioned and released 3. Measured Service: Cloud systems automatically control and optimize resource use by leveraging a metering capability 4. On Demand Self Service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider. 5. Resource Pooling: The provider’s computing resources are pooled to serve multiple consumers using a multi- tenant model.
NIST Definition (Cont)• From a service model perspective, cloud is also divided into: – SaaS, PaaS, IaaS• Software Services [SAAS] – A type of reusable bundle of functionality – (typically business related and infrastructure – related) which may be accessed by other cloud computing components, software or end users directly to create meta-applications. These bundles of functionality execute within the cloud.• The Run Time Platform [PAAS] – A solution stack as a service that facilitates deployment and run-time of applications that required specialized run-time environments. J2EE (clustered), .Net (clustered), Web Technologies (basic servlet, Web Services), basic or clustered, Virtualization, HPC program, Grid• The Cloud Infrastructure [IAAS] – Compute service, storage service, data parallelization service, remote access service, management service, and security service.
NIST Definition (Cont)• Deployment Models are divided into: – Public, Private, Hybrid, Community – Public Cloud: The cloud infrastructure is provisioned for open use by the general public. – Private Cloud: The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). – Hybrid Cloud: The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability. – Community Cloud: The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations).
Beyond NIST Definition• Limited to three service models – IaaS, PaaS, SaaS• Does not identify the capabilities of distributed systems – Big Data revolutionary capabilities – Linear scaling – Logarithmic reduction in processing times – Dramatic cost reductions to accomplish tasks formerly in the realm of traditional HPC
Evolution of Cloud in IC• Evolution of Cloud in IC/DoD – JIOC-I – DCGS-A • Version 2 • Version 3 • Rainmaker • DSC – DARPA – I2 Pilot• Customer Requirements drive Solution – Not cookie cutter • Budget, Data, Performance, Security, existing components are all drivers
IC Cloud ‘Flavors’• Compute Cloud (or Data Cloud) – Handle 3Vs – volume, velocity, variety of data – Petabyte to Exabyte scale with linear scaling – Big Table like construct and supporting capabilities • Hadoop File System (HDFS) • MapReduce • Zookeeper • Accumulo• Utility Cloud – Contains the suite of services/apps that interact with the Compute Cloud • Ozone Widget Framework (OWF) • Web Tier – Data Visualization Tools • Auditing• Storage Cloud – Scalable storage for large files – Imagery, FMV – Shared Directories• Most IC Cloud Implementations are a combination of all three flavors
Data and Utility Cloud ExampleLAYER 4Cloud AnalyticsLAYER 3Cloud ServicesLAYER 2Cloud Software Hardware UtilityCLOUDLAYER 1 GHOSTMACHINECloud Hardware
Cloud Stack Feature List Example• User Capability – Thin Client Dataspace Retrieval via Query • Textual or geospatial query • Display with time-wheel, geo-spatial map display, link-graph display – Thin Client Search for Resolved Entities • Display with Document, Entity Graph, Timeline, Timewheel, Entity Viewers – User Upload of Analyst Products to Cloud – Persistent Data Query and Alerting (on ingest) – Integrated Chat Widget, and Widget for Querying External Systems – Uniform Widget Experience and Data Sharing between Widget Views• Data Sourcing – Flexible, secure Data Ingest Architecture – Ingest processes three examples of data formats • Unstructured artifacts (e.g., free form report) • Semi-structured (e.g. email message) • Structured artifacts (e.g. RDB table, XML document)
Example: Cloud Stack Feature List (Cont)• System Capability – Advanced Analytics for Entity Extraction, Resolution and Link Analysis – API for Retrieval of Artifacts, Metadata and Semantic Indices – API for Application Access to Analytics Results – Cloud-to-Cloud Integration • Between applications running inside the cloud, and • Between applications running inside with applications running outside the cloud • Multiple messaging formats and tools supporting Inter Cloud integration for data, services, and resources – Cloud Computing Infrastructure, Scalable Storage with Double Redundancy – Security Infrastructure with Role and User Based Access Control• Management Capability – Log Viewer App: Ability to Monitor Users and Activities in Cloud – User Management App: Ability to Define Access for Users – Ingest Monitor widget: Ability to Track Progress of Data Ingest – Bulk Export/Import App for Dataspace – Cloud Management System for Monitoring and Control – Workflow Management System
“The need to securely share actionable, timely, and relevant classified information among state, local, tribal, and private sector partners in support of homeland security is critical as we work together to address evolving threats,” said Secretary NapolitanoData – The Hard Part
Data ArchitectureThe data architecture is divided between 1) a Mission Dataspace, and 2) an Operational Dataspace• Mission Dataspace – The primary business driver (or mission) for our customers is to support the Intelligence Community (IC) with a "solution for intelligence data integration, search, exploration, enrichment, exploitation, management, and sharing“ • The data on which these activities are undertaken is referred to as mission data and is stored in the Mission Dataspace.• Operational Dataspace – A location to persist operational data - data that is directly used/created by infrastructure application software to support their operation/execution, which in turn supports the mission • Includes input data (information entering the system for the first time from a system or end- user), work and message queues, temporary results, configuration files, and any purely transient information – Typically this data has a very narrow purpose (that of supporting a particularly business or infrastructure application). Dataspace can be implemented using 1) HDFS, 2) Cloudbase, 3) Cassandra, 4) MySQL, 5) FS (local, SAN), 6) Oracle (limited)
Unified Dataspace ExampleThe Wild• Data sources with Segment 3 - Model Descriptionrich data & semantic Rich semanticcontext locked in Data contextdomain silos Models• Data tightly coupled to data-models• Data-models tightly Segment 2 - Data Description coupled to storage Integration models Structured Enrichment Data ExploitationSilos isolated by Exploration Across all sources• Implementation technology Segment 1 - Artifact Description• Storage structure• Data representation Unstructured Rich data Data context• Data modality
Mission Dataspace Data Model• Structure Segment 3 - Model Description Rich semantic – Segment 1: Artifact Description Framework (ADF) Data Models context • Universal store for unstructured data (documents) Segment 2 - Data Description • Indexes Integration Structured Enrichment – Segment 2: Data Description Framework (DDF) Data Exploitation Exploration Across all sources • Universal store for structured data (entities, Segment 1 - Artifact Description attributes, relationships) Unstructured Rich data – Segment 3: Model Description Framework (MDF) Data context • Universal store for data / knowledge models – Reference Data • Used to “normalized” data in other segments • Used to support business functionality (e.g., lists of alternative name spellings for search, dictionaries) – Inverted Indexes • Specialized indexes to support business functionality (search, analytics)
Data Description Framework (DDF)• DDF – looks at data in the following ways – Mention: A chunk of data, either physically located within a tangible artifact, or contained within an analyst’s mind • “Washington” at offset x in file Y – Sign: A representation of all disambiguated mentions that are identical except for their indexicality • E.g., “Washington” – Concept: An abstract idea, defined explicitly or implicitly by a source data-model • E.g., City, Person, Name, Address, Photo – Predicate: An abstract idea used to express a relationship between “things” • E.g., isCity, isPerson, hasName, hasAddress, hasPhoto – Term: A disambiguated sign abstracted from the source artifact or asserting analyst • E.g., Washington Person; Washington Location – Statement: Encodes a binary relationship between a subject (term) and an object mediated by a predicate • E.g.,[Washington, Person] hasPhoto [GeorgeWashingtonImage.jpg]
Operational Dataspace• End user storage (documents, preferences, • Persistence of distributed state in case of products) total failure • Directory for digital certificates (LDAP)• System events/traps • Directory for security authorizations• Performance/resource utilization (LDAP) metrics/history • Security audit events• Application log messages • Threat assessment results• Messaging infrastructure message • Vulnerability assessment results persistence • “Scratch” area used by various• Data Surveillance: watch patterns, applications subscriptions and notification profiles. May • Working area to move files in/out of also need some working space cloud• Temporary indexes as well as final index • Policies, rules, configurations, etc sections (shards) • CM repository
Examples1. What are your requirements for Cloud 4. What are the complexities associated with Computing? your data in its current state? 1. Integrate Federated Workforce into Headquarter 1. Unstructured documents on shared drive Business Processes 2. Structured in legacy main frame 1. How many? 3. Semi-Structured documents with strict handling 2. Enterprise Storage Capabilities procedures (stored in ECM?) 1. For HQ or regions across the world/country? 4. Amount of Data (GB vs TB) 3. Provide Analytics for discovering and creating knowledge 5. What is your budget? 4. Sharing Information 1. Open Source vs Open Source and COTs solution2. What are the handling requirements for 6. What is your timeline? your data? 1. Solution can be driven by speed of delivery vs functional requirements, as an example. 1. Classified/LES 1. Leverage existing cloud solutions as a starting point, rather than 2. US Persons a final product 3. Title 6, 10, and/or 50 7. What components (Software & Hardware) 4. ICD 501/503 are available for reuse? 5. MOUs 1. Servers, SANs, Networking gear3. What is the anticipated security level 2. Meta Data Extractors associated with your cloud vision 3. One Way Guards 1. PL2, PL3, PL4??? 4. VM licenses
A Real-World Example Building up the CloudDistributed Common Ground System – Army (DCGS- A) Standard Cloud (DSC)
Business Need for DSCBreak the Data Barriers Achieve Previously Unachievable Scale• End data silos and their proliferation • Go bigger, faster, larger• Provide a universal data storage and computational fabric • Realize a truly large-scale data store• Make data ingest faster and simpler • Embrace an unbounded diversity of data, processing, and• Allow data to be endlessly reshaped / reused applications• Search, enrich, integrate/ fuse, exploit within and across all • Achieve orders of magnitude greater processing power data sources and domains • Expose familiar usage metaphors (e.g. Google, Amazon)Stop Moving Data, Start Using Data Get More Bang for the Buck• Ingest once, reuse endlessly • Deploy using fully automated procedures• Move computation to the data (and not data to the • Avoid almost all SW licenses computation) • Stay up and running with an inherently robust design that uses• Build highly sophisticated exploitation tools and commodity HW applications• Create quick mashups and mission applications Do New Science and Develop New Practice Around Intelligence• Surf around and explore the entire Intel Dataspace • Explore data and processing at entirely new scale and discover new• Connect all the dots in any way that makes sense from any insights and phenomena mission perspective • Cultivate an ever growing, increasingly rich, and productive• Change your mind and do it again, and again… in new ways Dataspace without messing up what you already have Bridge the whole IC and all Services with an open data and processing capability – the Dataspace
DSC Software StackClient Ozone WidgetsServices V3 / MFWS / DIB / GeoSpatial / BCSoftware PREFS / OWF / Safemove / OpenFireAs a Service GeoServer / Element Index / AntiVirus / AIDE DSMS / ASLI / ActiveMQ / Alerting JVM / Apache HTTP/Proxy / Tomcat Cloudbase / Katta / ZookeeperPlatformAs a Service MapReduce / HDFS / Flume / Oozie Logging / Auditing / Nagios / Ganglia Condor / Cloud Management System Linux / LDAP / MySQL / CAS DNS / DHCP / NFS / NTPInfrastructure HPSA (Puppet?) / HPNAAs a Service Servers / SAN / Network / Facilities
Facility• The DSC System production hardware is housed in a single twenty foot Performance-Optimized Data (POD) data center• The POD is configured to maximize its hardware payload while taking into consideration – Overall power availability – Individual device power consumption and power dissipation – Individual device weight – Individual device heat generation
Infrastructure – Hardware Profile• Two rack types – Compute – 222 servers – Management – 6 servers• 1,824 cores• ~100,000 MIPS (assumed Java 50 CPI)• 1.035 PB disk storage (raw)• 13.92 TB physical memory (RAM)• Environmental support – Active power w/backup generator – Two live coolers w/backup cooler
Compute Server - Profile• Processor : – Two Quad Core X5570 2.93GHz Intel Xeon CPUs -> 8 cores per servers• Memory configuration – varies: – 25 (of 222) nodes with 144 GB Memory via 18 8GB DIMMs [approx: $14K] – 75 (of 222) nodes with 72 GB Memory via 18 4GB DIMMs [approx: $10K] – 122 (of 222) nodes with 36 GB Memory via 18 2GB DIMMs [approx: $8K]• Storage: – Eight 500GB 6G SAS 7.2K 2.5in MDL Disk Drives – RAID 5 25 x $14K• Power: 75 x $10K +122 x $8K – N+N 750W Power Supplies $2.076M
Network ArchitectureKey design features:• Separation of the mission and management/operational data to ensure security and performance of the solution – Using VLANs• Connection to the DSC cloud will be restricted to entry point nodes for a single security choke point (Cloud Access Point Nodes). – Greatly simplifies boundary security – The POD internal network will be non routable (10.x.x.x) with external access only through the two entry points• Redundant paths from servers and enclosures via stacking cables to redundant switches at the core provides resiliency from core switch failures as well as cabling faults. – Each node has access to two independent switches in the enclosure.
Network Architecture (cont’d) Single access point through NAC Leaf Switches interconnect all nodesSAN connected toall 6 mgmt nodes Compute Rack Management Rack 10GbE Core Switches
Definition: Node Types• Cloud Head Node: – These nodes are responsible to execute the various cloud service “masters” – These masters are collocated together as many of them work together. One node may be responsible to run the Mission Dataspace version of these services, and another node may support the Operational Dataspace. A third node may act as a failover.• Cloud Access Point Node: – These nodes host the Web Infrastructure Services (web-servers and proxies) and portions of the Application and Systems Integration Subsystem, and act as a physical gateway into the cloud.
Definition: Node Types (cont)• Cloud Infrastructure Nodes: – These nodes are divided into two categories: • Low-level infrastructure applications such as DNS, DHCP and NTP (part of Core Infrastructure Services). • Cloud services (workers corresponding) to the Cloud Head Nodes (see above)• Cloud Management Nodes: – These nodes run general purpose applications used to manage the cloud such as: Identity and Access management Subsystem, Cloud Management System, Map Server, Chat Server, Cloud Logging Subsystem, and Cloud Monitoring and Metering.• Cloud Client Node: – These run business applications that use cloud services such as Ingest and Analytics.
Definition: Node Types (cont)• HDFS – Hadoop Distributed File System • HDFS Master / NameNode – Executes the file system namespace operations such as reading, writing, renaming and deleting files. – The Name Node Server is responsible for mapping file blocks into the Data Nodes Servers. • HDFS Worker / Data Nodes – Functions include storing and retrieving file blocks from the native operating system file system. – Coordinates with the NameNode to perform block creation, deletion and replication. – Called by MR Jobs to serve read/write requests in a highly distributed manner.
Structured Storage Service• Cloud Structured Storage Service – Responsible for providing a highly scalable and highly available logical structured storage capability, similar to what is traditionally known as a database – Can support BILLIONS of rows and MILLIONS of columns – Columns can be added at run-time to accommodate data – Based on • NSA Cloudbase GOTS • Cassandra
Cloudbase• Cloudbase is a Java based, distributed database, based on the Google BigTable design, created at the NSA• Based on a Master/Worker model – One Master (daemon) – keeps the overall metadata – Multiple Workers (TabletServer) – stores database tables in HDFS• Uses HDFS to – Store tables (data) – Store recovery logs, write-ahead logs• Supports Cell Level Security – Security markings defined by the application, stored and enforced by Cloudbase
Cassandra• Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store – implemented as a Distributed Hash Table (DHT)• Datastore for Synthesys, and Hypergraph analytics• P2P distribution model -- which drives the consistency model -- means there is no single point of failure – Each peer server is responsible for a portion of a Distributed Hash Table (range of keys) Portion of the• Writes directly to the local file system keyspace – No HDFS Keeps track of all members in the cluster
Data Parallelization• Cloud Processing Parallelization Service – Support the ability to easily leverage the large amount of processing resources that is available across multiple nodes • Instead of being limited to one node or a small number of nodes preconfigured in some type of physical cluster. – On DSC this may mean: • Parallelizing the processing of a file/source (e.g., ingest), to improve overall performance • Parallelizing the processing of data in the database in support of some analytic (exploitation or enrichment), or indexing – thereby improving overall performance response-time. – Based on Hadoop MapReduce facility for data-centric parallelization – Note – Algorithmic parallelization (a la MPI) is a likely future need
MapReduce (MR)• A facility to parallelize processing of files, with fault tolerance• Implemented as Master/Worker model• MR Job Tracker – Determines how to parallelize a job/application (using a Job configuration) and then schedules the work on a set of distributed Task Trackers (worker), each of which executes a portion of the job/application in parallel, monitoring them and re-executing the failed tasks. – Tries to assign the work to the node where the data is located in HDFS• MR Task Tracker – They are given the application software to execute and specifications on which “data split” they need to perform on – Periodically report back progress/health to the Job Tracker.• MR Job/Application – A Job/Application needs to be divided into various structural elements: a Mapper, a Reducer, a Partitioner, a Reporter and an Output Collector. – The Job/Application logic reads/writes data from the Dataspace via DSMS MR Helper API• DSC Job Service: – Facilitates the submission and monitoring of MR jobs via a UI
Cloud Logging• Cloud Logging Subsystem (CLS) – A proper logging facility is an extremely important service in a ultra-large scale environment. – Key functionality of the CLS includes: • Support custom and legacy applications • Support specialized cryptographic operations (e.g. encrypting, digital signatures) • Log “interaction” functionality including searching, reporting, analysis, viewing, etc • Log management including rotating, archiving – Thee modes: • Java API • Command Line bulk loader • Log4J adapter – The Security Auditing Subsystem leverages the capabilities of the CLS • Keeps security audit separate – Based on Cloudera Flume (essentially collectors and sinks)
Cloud Monitoring and Metering• Cloud Monitoring and Metering – Holistic Monitoring • Ability to provide a unified/consistent presentation of the health all monitored components (business application software, infrastructure software, operating systems, hardware, network devices), whether they are custom-developed or third-party acquired – Control • Ability to control/change the behavior of a monitored component without restarting this component – Near Real Time • Ability to alert in near real-time system/network/security administrators in response to an event from “inside” a component – Historic Trending • Ability to store performance (including resource utilization) and health data of various components over time for analysis – All devices (software and hardware) in the cloud are monitored, either using agents (push-model) or by polling (pull model or agent-less) • All DSC components will include a JMX agent to report their health and support some control (where appropriate) – Based on Nagios agents and JMX agents
Cloud Management System• Cloud Management System (CMS) – Oversees the efficient operation of the Cloud Computing Environment• Condor – Process control and monitoring – restarts process if failure occurs – Distributed process pool – Can start distributed processes from any node in cloud – Integrated with DSC Cloud Management System• DSC Cloud Management System (CMS) – Defines hierarchy of services and dependencies – Start/Stop cloud services (via Condor) – As a defined group – Individually – View status of running services via Nagios and exposed JMX beans• HP Network Automation – Monitor and configure network devices