Big Data - Implications for
Semiconductor Players
Presentation
May 2014
The market for field nodes and embedded sensors will grow,
fuelled, e.g., by the “internet of things,” reaching ~25 bn for...
What is Big Data?
The three V’s of Big data –
implications along each dimension
▪ Volume
▪ Variety
▪ Velocity
What’s next?...
Everyone, everything, every interaction leaves a data
vapor trail
WHAT IS BIG DATA?
Transactions: trillion of data points ...
..while trying to improve processing times
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
15 E13 E201120092...
New capabilities and enablers are required to capture
Big Data opportunities
SOURCE: McKinsey analysis 7
WHAT IS BIG DATA?...
Big data can generate significant financial value across
sectors
WHAT IS BIG DATA?
SOURCE: McKinsey Global Institute analy...
Three V’s of Big Data – Volume, Variety and Velocity
WHAT IS BIG DATA?
Low
Velocity
Volume1
Variety2
3 Velocity
High
Big D...
We have entered a new world of measurement , with the
‘internet of things’ creating immense amounts of
machine generated d...
Industrial sensor nodes for the Internet of Things will
fuel MEMS market growth mid- and long-term
SOURCE: McKinsey analys...
Machine learning is the science (or art) of building
algorithms that can recognize patterns in data and
improve as they le...
Example unsupervised learning: Auto-encoder networks
discover intrinsic structure in high-dimensional data
SOURCE: E. Hint...
Machine learning algorithms can operate ~30-80 times
faster on GPUs…
SOURCE: A. Cotter, J. Keshet, and N. Srebro, “Proceed...
HPC-as-a-service models are taking off
VARIETY
2
SOURCE: Press search; nVidia press release; team analysis
1 Developed tog...
Under certain
circumstances
… large service providers like Google and AWS may
design their own semi components
AWS is sett...
As data flows in faster and data centers grow bigger,
speed and power consumption become critical – memory
is a key bottle...
Performance gains take their toll – cooling requirements
increased dramatically, forcing vendors to look at "new
solutions...
Leading tech players are attacking this problem
along multiple fronts
VELOCITY
NOT EXHAUSTIVE
Semi. players are partnering...
Bringing it all together: Researchers believe integration
of more sensors, and memory onto one chip, leveraging
3D techniq...
… and further performance gains tackled through
FE & BE innovations
SOURCE nVidia; McKinsey
nVidia roadmap
1 Hybrid Memory...
What’s next?
• All semiconductor companies must adapt to the era of Big Data and the Internet
of Things. Big Data presents...
Upcoming SlideShare
Loading in...5
×

McKinsey Semiconductor Big Data 2014

1,604

Published on

Published in: Business

McKinsey Semiconductor Big Data 2014

  1. 1. Big Data - Implications for Semiconductor Players Presentation May 2014
  2. 2. The market for field nodes and embedded sensors will grow, fuelled, e.g., by the “internet of things,” reaching ~25 bn for MEMS alone in 2018 HPC-as-a-services will be of increasing importance strengthening the position of cloud service providers in the relevant eco system Fast memory access technologies – (a) In-memory databases/filesystems, (b) ultra-fast data fabrics and (c) SW optimized for Big Data retrieval – will be important to mine petabyte scale data for insights in real-time 3D integration (3D-IC) of SoCs will gain even more importance, integrating more sensors and memory, driven by physical data collection and criticality of fast memory access I II III IV SOURCE: Team analysis Big Data is causing seismic shifts, with 4 important implications for semiconductor players
  3. 3. What is Big Data? The three V’s of Big data – implications along each dimension ▪ Volume ▪ Variety ▪ Velocity What’s next? Outline of today’s discussion
  4. 4. Everyone, everything, every interaction leaves a data vapor trail WHAT IS BIG DATA? Transactions: trillion of data points from enterprise systems (e.g., ERP), inter-company activities (e.g., clearinghouses), and consumer activity Mobile: 5bn phones worldwide; growingshare of smartphones Sensors: embedded M2M sensors growing 30% annually, installed in everything from cars, roads, buildings, appliances, to medical devices Social: 30bn items shared on Facebook alone; 200bn tweets a day Scientific/engineering: satellite images, geophysical data, and genetic data Audio/video: with increasing ability to automatically scan and extract information, e.g., facial recognition BIG DATAIN A SINGLEDAY ONLINE ENOUGH INFORMATION IS CONSUMED TO FILL 168 MILLION DVDS 294bn E-MAILS ARE SENT MINUTES SPENT ON FACEBOOK 4.7M 2 MILLION BLOG POSTS ARE WRITTENVIDEO UPLOADED TO YOUTUBE 864,000HRS MORE IPHONES ARE SOLD THAN BABIES BORN INTERNET OF THINGS 5000XMORE CPU POWER VS 1995 FIELD-NODES SALES 35%CAGR OVER 200 MILLION NODES IN 2015
  5. 5. ..while trying to improve processing times 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 15 E13 E2011200920072005 0 20,000 40,000 60,000 80,000 100,000 120,000 15E13E2011200920072005 New digital data vs. enterprise disk storage capacity shipments Petabytes Enterprise disk storage capacity shipments, Unstructured vs. structured data Petabytes Enterprises are faced with a doubling of data scale every 18 months... ...and with ever higher complexity of data types... Enterprise disk storage capacity shipments New digital data touched by enterprises Structured data Unstructured data 2-4 weeks current latency between operational and analytical systems Organizations are seeking solutions for managing an ever-increasing amount and variety of available information – “big data” SOURCE: IDC, Credit Suisse
  6. 6. New capabilities and enablers are required to capture Big Data opportunities SOURCE: McKinsey analysis 7 WHAT IS BIG DATA? Large data sets, new data sources • Real-time collection and management of big data sets • Data architecture and capabilities allowing for analysis across varied data sets Expertise • Deep experience/understanding of core analytics – Machine learning, applied math – Statistics, econometrics • Ability to build and deploy analytic tools to optimize performance Technology • Hardware to facilitate processing and storing large datasets • Software programs and platforms to use and build sophisticated analytical models and tools • High data security Sophisticated analytic techniques • Core methods (regression, CHAID) combined with new methods • Language and pattern recognition • Time to develop distinctive analysis or tools will be greatly reduced
  7. 7. Big data can generate significant financial value across sectors WHAT IS BIG DATA? SOURCE: McKinsey Global Institute analysis Europe public sector administration • €250 billion value per year • ~0.5 percent annual productivity growth US health care • $300 billion value per year • ~0.7 percent annual productivity growth Manufacturing • Up to 50 percent decrease in product development, assembly costs • Up to 7 percent reduction in working capital US retail • 60+% increase in net margin possible • 0.5–1.0 percent annual productivity growth Global personal location data • $100 billion+ revenue for service providers • Up to $700 billion value to end users
  8. 8. Three V’s of Big Data – Volume, Variety and Velocity WHAT IS BIG DATA? Low Velocity Volume1 Variety2 3 Velocity High Big Data
  9. 9. We have entered a new world of measurement , with the ‘internet of things’ creating immense amounts of machine generated data SOURCE: McKinsey analysis 1 Freescale is on the forefront of low-power ARM MCUs, e.g. on February 25th, Freescale released the KL03 at the MWC, the smallest ARM MCU unit to date with a 1,6mm x 2mm footprint 2 Gap between Bluetooth low energy (short range) and wireless (high power) “Big data starts with little data” “Little data comes from intelligent sensor systems in the IoT” Background The Internet of Things (IoT) and Big data Building blocks of Intelligent sensor system Long-term autonomous sensor system Low bit rate network technology Power efficient application processor technology MEMS RF com- ponents Sensors Optional Energy Harvesting Bottleneck today: A) Sensor technology (mostly non-MEMS) • Too high energy consumption • Lack of application tailored systems • Too expensive • Lack of standardized interfaces and protocols B) Communication protocol • Lack of standardized low-power, wide range protocol 2 “MEMS sensors, together with power efficient MCUs and intelligent data networks will deliver the promise of Internet of Things” – Babak Taheri, VP/GM at Freescale in January 20141 MEMS technology enables the following • Sensor functionality • Small size and weight • Low power consumption or even power autonomy • Radio performance VOLUME 1
  10. 10. Industrial sensor nodes for the Internet of Things will fuel MEMS market growth mid- and long-term SOURCE: McKinsey analysis VOLUME 1 Estimated number of MEMS devices Billions Growth opportunity in MEMS devices and integrated sensor nodes: • The enabling technology specifically for industrial IoT nodes, is a small, cheap, robust and low-power sensor • This will enable the “everywhere-deployment” of sensors for machine health monitoring, the smart home or building technology 24.4 2013 2018 9.9 +21% p.a. Implication II
  11. 11. Machine learning is the science (or art) of building algorithms that can recognize patterns in data and improve as they learn SOURCE: Shark Machine learning library, McKinsey analysis VARIETY 2 EXAMPLE: SVM Traditional methods often provide good approximations, but not in complex cases ML methods, such as support vector machines are much more flexible • Simple linear classifiers only work for linearly separable data • K-means is often a good approximation, but is not very flexible (e.g., spiral data) • SVMs can identify complex relationships in the data by: – Converting the problem into a higher dimensional space to find the hyperplane that best separates the data – Projecting a “shadow” of this hyperplane back into normal space to draw the decision boundary Support vector machine Requires repeated n x n matrix multiplication
  12. 12. Example unsupervised learning: Auto-encoder networks discover intrinsic structure in high-dimensional data SOURCE: E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, July 2006, Volume 313, Number 5786, pp. 504–7; Sam T. Roweis and Lawrence K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, December 2000, Volume 290, Number 5500 pp. 2323–6 VARIETY 2 Linear methods are insensitive to higher-order nonlinear structures More sophisticated methods are necessary to uncover patterns in high-d data • Linear methods cannot unroll the Swiss roll • The “curse of dimensionality” motivates the need to reduce the complexity of data Intrinsic d d Classification performance Requires repeated n x n matrix multiplication (autoencoder) • Linear method applied to ~800,000 news stories • Autoencoder network applied to same data
  13. 13. Machine learning algorithms can operate ~30-80 times faster on GPUs… SOURCE: A. Cotter, J. Keshet, and N. Srebro, “Proceedings of the 17th ACM SIGKDD,” an international conference on knowledge discovery and data mining, 2011; Dahl, George E., et al. "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition." Audio, Speech, and Language Processing, IEEE Transactions on 20.1 (2012): 30-42; team analysis VARIETY 2 1 SVM = support vector machine; run times do not include time spent during initialization (or clustering). 2 CPU = central processing unit; CPU implementation of LIBSVM: K. Crammer and Y. Singer, “On the algorithmic implementation of multiclass kernel-based vector machines.” JMLR, 2265–292, March 2002. 3 GPU = graphics processing unit; GPU implementation of SVM: A. Cotter, N. Srebro, J. Keshet, “Proceedings of the 17th ACM SIGKDD,” an international conference on knowledge discovery and data mining, 2011. 4 Context dependent – Deep Neural Network – Hidden Markov Model hybrid 5 24 hours of recorded data, containing 32,057 utterances ADULT 61 1.2 MNIST 264 3.9 TIMIT 276 3.5 Run time Seconds Speed up on GPU3 vs CPU2 Ratio of CPU2: GPU3 implementation run times CPU2 GPU3Training dataset COV1 16,200 432 Algorithm Multi-class kernel- based SVM1 Business search dataset5 (Bing mobile voice search) 5.2 x 10-3CD-DNN- HMM4 0.2 x 10-3 Implication II Typical training data sets, e.g., TIMIT is a dataset of North American speech 26 38 78 68 52
  14. 14. HPC-as-a-service models are taking off VARIETY 2 SOURCE: Press search; nVidia press release; team analysis 1 Developed together with partners, using nVidia Tesla GPUs NOT EXHAUSTIVE 2012 2010 2011 • Amazon releases GPU instance on Elastic Compute cluster EC2 in November • Amazon EC2 GPU cluster ranked 72 of the top 500 supercomputers in the world, ahead of NASA Goddard Space flight center, Lawrence Livermore National Lab • Amazon brings GPU instances for cloud computing to Europe, starting with Ireland in June • UK Government, responding to an unmet need in Europe unveils CORE to provide HPC-as-a-service to UK businesses, developed in partnership with the University of Cambridge and Imperial College, London • Nimbix (Oct 2010), Peer1 (Jun 2010) and others offer GPU instances in the cloud • Jul 2012: nVidia unveils Kepler – optimized for GPU computing in the cloud • GigaOm says that IBM’s “HPC prowess and new vertical-focused cloud strategy could make a CPU-GPU cloud for the oil and gas industry, for example, a realistic offering.” ▪ April 2012: SoftLayer offers GPU instance in the cloud • Nov 2011: Zillians releases first commercial virtualized GPU (vGPU) in the cloud 2008 • Nov 2008, nVidia releases first ‘desktop supercomputer1’ CUDA Capable GPUs200 million CUDA Toolkit Downloads600,000 Active GPU Computing Developers 100,000 Fast facts: GPU computing today
  15. 15. Under certain circumstances … large service providers like Google and AWS may design their own semi components AWS is setting up a chip design center in Austin, TX and has hired several Calxeda team members Google acquired Agnilux for ARM based server chip design and partnered with IBM for Openpower foundation • Access to chip design talent (e.g., through M&A) • Access to processor core architectures (e.g., ARM, Power) • Access to leading edge manufacturing node (E.g., through TSMC, Samsung, Global Foundries) Are recent moves of could service providers an indica- tion for more fundamental changes in the eco system? VARIETY 2 SOURCE: McKinsey, Press clippings In the wireless handset market, Apple has gone down a similar road successfully and now design their own processors
  16. 16. As data flows in faster and data centers grow bigger, speed and power consumption become critical – memory is a key bottleneck… VELOCITY Energy consumption Latency (read/write access time) RAM 83 nanoseconds (83 x 10-9 seconds) F-18 hornet max speed: 1,190 mph Banana slug max speed: 0.007 mph HARD DISK 13 milliseconds (13 x 10-3 seconds) The high cost of data movement Fetching operands costs more than computing on them 3 DRAM Rd/Wr Efficient off-chip link 26 pJ 50 pJ 256 pJ 16 nJ 500 nJ 28 nm 20mm 1 nJ 64-bit DP 20pJ 256-bit buses 256-bit access 8kB SRAM SOURCE: nVidia presentation (http://www.nvidia.com/content/PDF/sc_2010/theater/Dally_SC10.pdf), Press search; Team analysis
  17. 17. Performance gains take their toll – cooling requirements increased dramatically, forcing vendors to look at "new solutions" Leading-edge graphic card comparison 2003 - ATI Radeon 9700(~50W) 2014 - AMD R9 295X2 (~500W) VELOCITY 3 From To • Sophisticated closed-loop water cooling solution • 2 integrated & connected Asetek pumps (one for each GPU IC) with sealed tubes plus separate heat exchanger requiring case space • Cooling material cost ~$100+ • Small & light metal heatsink • Small fan with reasonable rotating speeds • Cooling material cost ~$5-10 SOURCE: ATI; AMD; Anandtech; McKinsey
  18. 18. Leading tech players are attacking this problem along multiple fronts VELOCITY NOT EXHAUSTIVE Semi. players are partnering with SW developers to optimize for Big Data performance, e.g., Dell/ARM + Apache Implication III i ii v iv Key playersCategory • New databases/ filesystems – BigTable/ HBase/ Cassandra (NoSQL) • Cluster programming models/ languages – MapReduce/ Hadoop, STORM • Novel querying & analysis tools • Ultra-fast data fabrics for multi-core servers • In-memory databases – SAP HANA Description • Non-relational, distributed databases - can store petabyte- scale, sparse data • Compression, in-memory operation, bloom filters • Google developed MR in 2004: splits problems across nodes in a cluster (Key steps Map, Reduce) • Facebook has largest Hadoop cluster (100 PB) • STORM (developed by twitter) and Kafka allow real-time (vs. batch processing) • Ad-hoc querying: Dremel (Google, BigQuery), Drill (Open- source) • Graph analysis: Pregel (Google), Gremlin Giraph (Open-source) • Analytics: Matlab, Julia, R (Open-source) • High-performance, cache coherent interconnects for multi-core' enterprise servers • Critical for micro-servers • Rely on main memory (RAM) vs. disk storage • Faster, more predictable performanceiii SOURCE: Press search, team analysis
  19. 19. Bringing it all together: Researchers believe integration of more sensors, and memory onto one chip, leveraging 3D techniques will increase… WHAT’S NEXT? Example: Qualcomm Snapdragon S3 provides complete system solution for smartphones GPU Display Camera CPU Modem GPS USB Adreno 220 high- performance 3D processor Support for 1080p HD video Dual camera up to 16M pixels, stereoscopic 3D kit 1.5GHz dual-core ARM-based Scorpion CPU gpsOne Gen8 USB 2.0 High Speed OTG (480Mbps) 3G modem - (HSPA+/1xAdv/ 1xEV-DO/GSM/GPRS/EDGE) Implication IV …researchers see this trend continuing, with 3D integration enabling further progress SoCs are becoming the standard, integrating ever more chips onto one die… • Emerging 3D IC tech: – TSV – Silicon Interposer – Micro Bump – 2.5D-IC – 3D-IC • Benefits: – Power – Cost – Reuse • Yole development reports 2011 market of 3D IC devices to be $2.7B, growing at 18% CAGR • TSMC and Synopsys unveil comprehensive 3D-IC design flows in Q3 2012 SOURCE: James J.-Q. Lu, “3D Hyper-Integration: Past, Present and Future”, Future Fab International, Vol. 41, April 2012; Sematech, Synposis; Press search, Semi.org (preview of SEMICON Taiwan 2012), Team analysis
  20. 20. … and further performance gains tackled through FE & BE innovations SOURCE nVidia; McKinsey nVidia roadmap 1 Hybrid Memory Cube Two main GPU application performance challenges GDDR5 topping out at ~384- 461GB/s for 384/512Bit configurations (1) Memory bandwidth (2) Graphic card interconnect interface GPU CPU Mem Mem PCI E (1) Memory bandwidth (2) Graphic card interconnect interface "NVLINK" introduction3D Memory introduction GPU CPU Mem Mem PCI E GPU Mem PCI E NVLINK Bandwidth bottlenecks … addressed by new memory and interconnect interfaces • PCI E 16GB/s • CPU Mem 60GB/s • GPU Mem 288GB/s • Performance expected to be 5-12x of PCI E • Cache coherency in Gen2 • Unified Memory architecture • Works across Multi-GPU configurations • 3D Chip-on-wafer integration (HBM1) • Memory bandwidth expected to increase by 2-4x • Memory capacity expected to increase by ~2.5x • Energy efficiency gains of ~4x
  21. 21. What’s next? • All semiconductor companies must adapt to the era of Big Data and the Internet of Things. Big Data presents a huge opportunity for semiconductor companies to power this next phase of growth. • There are many, many ways semiconductors, whether they are CPUs, GPGPUs, chip-based sensors, integrated field nodes, 3D ICs and optical chip interconnects will power the core of the machines that drive the next s-curve in productivity and insight-generation. • Given the context of the broad Big Data revolution, it will be important for chip players to acknowledge that software analytics will be as important as their precious semiconductors are. • To tackle the challenges of deploying advanced analytics for a Big Data world, semiconductor companies will benefit from alliances or even acquisitions of the software and middleware players also working on their pieces of the Big Data puzzle.

×