Girish Juneja - Intel Big Data & Cloud Summit 2013


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Girish Juneja - Intel Big Data & Cloud Summit 2013

  1. 1. APAC Big Data & Cloud Summit 2013 Girish Juneja GM, Big Data Software Software & Services Group
  2. 2. Data Fab Transistor System Enablement Optimization Intelligence
  3. 3. Data 30 million networked sensors growing at 30% a year Computing 1 trillion devices connected to the Internet by 2015 Experience 500 million smart phone users increasing 20% a year Social Machine Generated User Generated Feedback loops driving exponential growth
  4. 4. Evolving towards end-to-end real-time analytics Decade Paradigm Architecture Platform • Reporting / Data Mining • High Cost / Isolated use 90s 2000s Today • Model-based discovery • High Cost / Dept Use • Unbounded Map Reduce Query • Low Cost / Enterprise Use • Arrival of vast amounts of unstructured data • Batch – “sales reports” • Sequential SQL queries • Batch-ie correlated buying pattern • No SQL. parallel analysis • Shared disk/memory Unlimited Linear Scale RDMS Proprietary MPP/ DW Appliance Open Source SW loosely coupled to commodity HW No SQL RDMS Scale Scale NodeNode • Real-time - ie recommend engine • Process @ storage node • Built-in data replication/reliability • Shared nothing, in memory Distributed node addition NodeNode Node Multi-core Node
  5. 5. Make big data work for you Amount of data your enterprise will need to ingest: 50X Proportion of data that is useful to you: 10% Projected increase in your IT budget: 10% => Business as usual is not an option
  6. 6. Software Global Ecosystem Security Systems Architecture Energy Efficient Performance Manufacturing Leadership Benefit from Intel’s long-standing investments
  7. 7. Using volume economics to drive innovation Intel
  8. 8. Fabricating silicon for big data 22nm A Revolutionary Leap in Process Technology 37% Performance Gain at Low Voltage1 >50% Active Power Reduction at Constant Performance1 Intel lead vs. Industry 3.5 years 2007 45 nm 2009 32 nm 2011 22 nm High-k Metal Gate Tri Gate Intel lead vs. Industry 4 years
  9. 9. Intel® Xeon® Processor E5-4600 Product Family Highest reliability & scalability Highest memory capacity Highest enterprise & database performance Density-optimized Cost-optimized Improved HPC performance 1 Source: Published results as of 8 May 2012. See for full list of benchmarks and configuration details. Pumping the heart of the open datacenter Intel® Xeon® Processor E7-4800 Product Family Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
  10. 10. Enabling open source solutions Optimize software to take advantage of Intel® architecture AES-NI SSD, 10GbE TXTMCAVT-* 3x performance in 3 years Mission Critical deployments Accelerates Crypto in JBoss 30x throughput Trusted Compute Pools
  11. 11. Contributing to Apache Hadoop • File based encryption for Hadoop jobs • ACLs for HDFS and HBase at cell level • Flash storage for MapReduce shuffle data • Caching and non-volatile memory for increased throughput • HDFS adaptive replication of hot-files • HBase distributed tables across data centers • HDFS data replication across data centers • Archival storage support for cold data on HDFS • SSE Instructions • JVM Enhancements • Infiniband RDMA Support
  12. 12. Supporting Intel Distribution for Apache Hadoop Data Mining Graph Analytics Full Text SearchFull SQL Batch Analytics Security
  13. 13. Intel® Distribution for Apache Hadoop* software Granular access control in HBase Up to 20X faster crypto with AES-NI* 30X faster Terasort on Intel® Xeon processors, Intel 10GbE, and SSD Up to 8.5X faster queries in Hive* Job profiling and configuration, automated by Intel® Active Tuner *Based on internal testing Rhino Cloud HPC Common authentication, access control, auditing Bringing MapReduce to data on Lustre FS Enabling real-time 100% SQL on Hadoop Optimizing Hadoop for virtualization & cloud
  14. 14. Backed by portfolio of datacenter products Software NetworkStorage & MemoryServer Cache Acceleration Software
  15. 15. With broad support from the ecosystem * Other names and brands may be claimed as the property of others.
  16. 16. Proven in the enterprise Using the Intel® Distribution to gain tremendous results * Other names and brands may be claimed as the property of others. IT
  17. 17. Putting advanced capabilities at work… • Expose new data • Dashboard/historical reporting • Real-time campaigns • Vertical apps • Predictive data services • Graph visualization • Log analysis to solve real use cases • Fraud & threat detection • Life sciences research • Behavioral analysis • Warranty analysis • Customer segmentation • Infrastructure optimization From Hype to High Performance
  18. 18. Data-Driven Business: Customer Service Value • Enable subscriber access to billing data • 30X gain in performance; lower TCO Analytics • Provides real-time retrieval of 6 months data • Supports new BI with 15 types of queries • Enables targeted ad serving and promotions Data Management • 30 TB/month of billing data • 300K reads/second; 800K inserts/second • 133-node cluster / Intel Xeon E5 processors CDR Subscriber Self Service Intel Distribution
  19. 19. Value Enable researchers to discover biomarkers and drug targets by correlating genomic data sets 90% gain in throughput; 6X data compression Analytics Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) Provide APIs for applications to combine and analyze public and private data sets Data Management Use Hive and Hadoop for query and search Dynamically partition and scale Hbase 10-node cluster / Intel Xeon E5 processors / 10GbE Data-Intensive Discovery: Genomics Intel Distribution
  20. 20. Data-Rich Communities: Smart City Value • Enforce traffic laws and detect license fraud • Monitor and predict traffic patterns • In a city of 31 million people Analytics • Detect traffic law violations automatically • Detect driver license fraud by data mining • Forecast traffic with predictive analytics Data Management • 30,000 cameras • 6Mb/s stream rate per camera • 15 PB of images in use / 2B records in HBase Detection Prevention Regional Local
  21. 21. Catalyzing the ecosystem Foster the ecosystem and develop new markets for Intel and its partners
  22. 22. Resources Content Case Studies Whitepapers Demos Contacts Girish Juneja RK Hiremane Eddie Toh