View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petabyte-scale data warehouse service in the AWScloudEasy to provision and scaleNo upfront costs, pay as you goHigh performance at a low priceOpen and flexible with support for popular BI tools
How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, # ofnodes, types of nodes, customconfigs, Hive/Pig/etc.Get the output fromS3Launch the cluster using theEMR console, CLI, SDK, orAPIsYou can also storeeverything in HDFS
EMREMR ClusterResize NodesS3You can easily add andremove nodes
Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $168
Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42
Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42= Total $12625% reduction in price50% reduction in time
Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan terminate the cluster (and stoppaying)1
Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Transient jobs1
EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youcan also run a persistent clusterand use RI models to save costs2
EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-reduce-mapoutputs2
S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elastic• Version control against failure• Run multiple clusters with a singlesource of truth• Quick recovery from failure• Continuously resize clusters3
S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster copy of the data in S3Get all the benefits of S3HDFSS3distCP4
Scott CranePacketloop – Big Data Security AnalyticsCEO & Co-founder
Disclaimer and Urban MythCustomers must make the decision to upload data to Packetloop.We do not transparently intercept customer traffic, nor is it possible withinAWS to do this.AWS does not give us access to any other AWS customer traffic.
What is Packetloop?• Big Data Security Analytics• Uses complete data set from the network flow via packet capture• 100% delivered in the Cloud• Instantly available, always up to date• Powerful visualizations• Intuitive to use• Reduces security analysis to minutes
What business problems are we solving?• Security related information is growing exponentially• The current generation of technology is struggling to deliver the intelligenceorganizations needs, and these technologies create friction due to:– Solution complexity– Amount of integration and customization required– Lack of context and fidelity• Threats are becoming more complex, including blended attacks and longrunning attacks (spanning months and potentially terabytes of flow data)• Analysts have less time and are forced to be more reactive
Who are we targeting?• Any organization that definitively wants to know exactly what is happening ontheir networks using information that can be determined in real-time and theinformation that can be added over time.• Customers that are currently not receiving what was promised by SIEMsolutions in terms of analytics, size and scale, fidelity and drill-down capabilities.• Organizations that are already leveraging Cloud providers such as AmazonAWS.• Security consultants, Analysts, Penetration Testers who want to take packetcaptures and quickly analyze them by uploading to the cloud.
What business challenges did we face?• Fastest processing possible• Infinite scale and storage• Global presence• Always be available and up to date• Commodity affordability• Small team of people• Limited capital• Based only in Sydney• Current databases don’t scale theway we needed.The Vision The Reality
Why choose AWS?• Brand – number 1 in Cloud market• Presence - everywhere we need to be• Availability options – allows us to build in the resilience we need• Flexibility and elasticity – only use what we need and when we need it, whilstsupporting limitless horizontal growth• Feature sets - always expanding, allows us to constantly refine our offering• Support – AWS supports our business growth• Cost – low to start with, always improving, easy to understand and predict
What do we use?PgSQLCASS CASSLOOP IPSWEB WEBSubnet A/24Subnet B/24ZONE: US-WEST-2a ZONE: US-WEST-2bNAT to Elastic IPs NAT to Elastic IPswww.packetloop.com?Loop NetworkPgSQLCASS CASSLOOP IPSWEB WEBSubnet C/24Subnet D/24Loop NetworkVPCROUTERCassandra Replicates between availability zonesPostgres is Active/Active between availability zonesElastic Load BalancerEMR-1 EMR-N EMR-1 EMR-N
What do we use?• Elastic MapReduce (EMR) – Hadoop to process jobs to extract securityanalytics• Cassandra – Patented insertion method for storing security metrics data• PgSQL – user databases, customer settings• IPS – 2 open source and 2 commercial to obtain indicators and warnings• S3 – Packet capture storage, both long term and temporary• VPC – handles replication and active/active traffic between Availability Zones• Elastic Load Balancer – allows us to scale out Web instances as needed• Cloudflare (not shown) – cache and acceleration
What has AWS allowed us to achieve?• Global presence and big company performance• To be the first truly Cloud centric Security Analytics tool• Deliver a revolutionary security analytics tool to any user/analyst on the Internetas a commodity service (charged per GB/per month)• To dynamically change development and architecture direction without worryingabout any capital investment we may have already made, and while maintaininga full production instance• Determine exactly what we spend and 100% link it to customer demand• To remain a self funded startup
What’s next?• Shift from batch processing and post hoc analysis to real time processing• Addition of On Premise appliances, Virtual Machines and AMIs to perform localcapture, preprocessing and transmission of security metrics to Cloud• Additional modules for analyzing Sessions, Protocols and Files• Move to Probabilistic Threat Analysis using machine learning
Do your own Big Data Security Analytics…..• Packetpig is an open source version of our Network Security Analytics toolsetavailable at github.com/packetloop/packetpig• Optimised in October 2012 to use AWS Elastic Map Reduce - how to configureblog.packetloop.com/2012/10/packetpig-on-amazon-elastic-map-reduce.html• Configurable scripts to specify what size AWS instances are used for Hadoop,and how many instances are to be spawned to run the mappers and reducers
Corey Loehrcorey.firstname.lastname@example.orgExecutive, DigitalEconomy EnablementIntel Australia and NewZealand
Analysis of Data Can TransformSocietyCreate newbusinessmodels andimproveorganizational processes.Enhancescientificunderstanding, driveinnovation,andaccelerateIncreasepublic safetyand improveenergyefficiencywith smartgrids.
Democratizing Analytics getsValue out of Big DataUnlockValue inSiliconSupport OpenPlatformsDeliverSoftwareValue
Intel at the Intersectionof Big DataEnablingexascalecomputing onmassive datasetsHelpingenterprisesbuild openinteroperable cloudsContributing code andfosteringecosystemHPC CloudOpenSource
Intel at the Heart of the CloudServerStorageNetwork
Scale-Out PlatformOptimizations for Big DataCost-effectiveperformance•Intel® Advanced VectorExtension Technology•Intel® Turbo BoostTechnology 2.0•Intel® AdvancedEncryption Standard NewInstructions Technology
52Intel® Advanced VectorExtensions Technology• Newest in along line ofprocessorinstructioninnovations• Increasesfloating pointoperations perclock up to2X1performance1 : Performance comparison using Linpack benchmark. See backup for configuration details.For more legal information on performance forecasts go to http://www.intel.com/performanceSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, aremeasured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult otherinformation and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Intel® AdvancedEncryptionStandard NewInstructions• Processorassistance forperforming AESencryption7 new instructions• Makes enabledencryption softwarefaster and stronger
Power of the Platform builtby IntelRicheruserexperiences4HRS50%Reduction10MIN80%Reduction 50%Reduction40%ReductionTeraSort for1TBsortIntel®Xeon®ProcessorE52600Solid-StateDrive10GEthernet Intel®ApacheHadoopPreviousIntel®Xeon®Processor
CloudIntelligentSystemsClientsVirtuous Cycle of Data-Driven Experience