3. Что движет Большие Данные?
10X
Объем и
тип данных
Снижение стоимости
вычислений и
хранения
Рост данных к 2016
90% неструктурированные 1
Стоимость
сервера
СХД
$/GB
2002 - 20122
2002 - 20122
40%
90%
$17B
Инвестиции
в инструменты и
сервисы
3
1: IDC Digital Universe Study December 2012
2: Intel Forecast
3: IDC WW Big Data Forecast
$3B
$7B
2010
2012
20173
4. Сегодняшние применения Больших Данных
Операционная
эффективность
Анализ поведения
покупателей
Оптимизация
движения
«Умные» сети
4
Использование
геолокационных
данных
Программа
защиты
покупателей
Безопасность и
управление
рисками
Персонализированн
ая превентивная
помощь
Снижение
фиктивных
обращений
5. Большие Данные в Intel
Сокращение времени тестирования:
Предиктивная аналитика в производстве для быстрой
идентификации дефектных изделий
Ожидаемый эффект ~$20M в 2013
Управление каналом продаж:
Улучшение работы с реселлерами на основе анализа профилей
Рост продаж на $5M/квартал при сокращении расходов на $6M
Обнаружение зловредного ПО:
Анализ ~4 млрд событий в день на уровне систем, сетей и
приложений для предупреждения угроз вторжения
5
7. Однако сегодняшняя инфраструктура ЦОД
сдерживает инновации
Сети
2-3 недели для развертывания новых
сервисов1
66% CAGR для мобильного трафика2
СХД
40% CAGR рост данных ,
90% неструктурированных
Серверы
Средняя утилизация <50% несмотря на
виртуализацию4
1: Source: Intel IT internal estimate
2: Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012–2017
3: IDC’s Digital Universe Study, sponsored by EMC, December 2012
4: IDC Server Virtualization and The Cloud 2012
9. SDI: Трансформация сетей
Аппаратно-определяемые
Виртуализированные Программноопределяемые
Сетевые приложения
VPN Firewall Other
VPN FirewallOther
Виртуализация
сетевых функций
VM1
VM2
Стандартные
платформы
Транспорт данных
Виртуальные сети
VM3
Сетевые
функции
и управление
Стандартные
платформы
Пулы сетей
SDN
Controll
er
VM1
VM2
VM3
Развертывание и
программирование
сети
10. Трансформация сетей: элементы Intel
Архитектура Intel®
Ускорение коммуникаций
Intel Communications
Chipset
Intel® Ethernet
Intel NIC
Intel VT-c
Intel VT-X
Intel VT-d
Поддержка экосистемы
Intel Open Network Platform Server
Reference Design
Intel® Network Builders
ПО
ONP w/Alta
Switch
Производительнос
ть
Стандарты
Открытые модули
Открытые API
QuickAssist APIs
11. Трансформация СХД
Традиционная СХД
СХД завтра
СХД as a service
Frequent
Hot
SAN
Разделяемая емкость
Высокая
производительность
Защита данных
Access
Warm
Cold
Infrequent
TB
Capacity
ZB
Выбор оптимизированных
решений
Определяется приложениями
Выше эффективность
12. Трансформация СХД: элементы Intel
Next Gen NVM
Акселераторы и SoCs
Поддержка экосистемы
ПО
VMware
Microsoft
Red Hat
Nexenta
Plus others….
Intel Storage Acceleration Libraries (ISA-L)
Cache Acceleration
Software
Enterprise
Edition for Lustre
Intelligence for
Efficiency and
Resiliency
Tiered for
Capacity and
Availability
13. Intel® CAS с Intel® SSD
При использовании в качестве кэша ускоряет операции с
Большими Данными
50X IOPS
3X TPC-C
20X TPC-H
http://www.intel.com/content/www/us/en/mission-critical/mission-critical-scalability-oracle-intel-brief.html
throughput performance
15. Гибкая инфраструктура Rack Scale
Сегодня
Физическая агрегация
Завтра
Будущее
Агрегация на основе
связной архитектуры
Агрегация подсистем
Orchestration
SDN
Shared Power
Shared Cooling
Rack Management
Open Network Platform
IO
Storage-PCIe-SSD & Caching
Compute
Photonics & Switch Fabric
Memory
Silicon: Intel® Atom™ & Xeon
Pooled compute
Pooled storage
Pooled memory
Shared boot
Storage
CPU /
Mem
Modules
Экономичная, адаптивная инфраструктура
ЦОД
16. Методы обработки различных типов данных
Неструктурированн
ые
Новые технологии
Аналитические
инструменты
MapReduce
/Hive
Структурированн
ые
Реляционные
БД
EXALYTIC
S
* Other names and brands may be claimed as the property of others.
17. Большие Данные: Роль Intel
Cache
Acceleration
Software
AIM Suite
Сквозная
аналитика
Intel®
Enterprise
Edition for
Lustre
Продукты для ЦОД
17
Other names and brands may be claimed as the property of others
Поддержка
экосистемы
Серверы
СХД
Сети
Конечные
устройства
и шлюзы
Key message: We’ve always used data analysis, but the ability to more cheaply handle much larger data sets from a wider array of sources is the driver of Big dataBy 2016, there will be 19 billion network connected devices up from 10.3 billion in 2011. 5 billion will be personal devices (PCs, laptops, smartphones, tablets) and 14 billionn will be M2M devices. Data growth: IDC’s definition is focused on “the measure of all the digital data created, replicated, and consumed in a single year.” 2010: 1200 Exabytes. 2016: 12,000 Exabytes: 2020: 40,000 Exabytes(1 Exabyte = 1000 Petabytes). IDC Digital Universe Study Dec 2012)Lower cost of compute / storage: Server system pricing declined from $11.5K in 2002 down to ~$6.5K in 2012 (IDC WW Server Tracker)Cost of storage has declined from $24 per GB to $1.5 per GB from 2002 – 2012 (IDC Storage Tracker)
Key message: Big Data is beneficial to Intel in many ways – from manufacturing improvements, to safer infrastructure to better targeted sales activities Intel has been able to use big data to improve operations.Intel is not just talking about Big Data, we’re using it inside the company to improve our own ROI. We understand ways to harvest the value out of data, and we would be willing to bring in our IT and business solutions teams to discuss our approach with you.Intel TMG realized a savings of $2M through test time reduction for Ivy Bridge and projects $20M for Haswell in 2013. The system predicts which units will fail and avoids lengthy testing. We also have increased our analysis of our customer purchasing and use habits to better identify the needs of our customers. This resulted in better targeting of sales, which increased $5M per quarter while our costs decreased by more than that.Intel is often the target of malicious security attacks, and we have logged that data for quite some time. We are in the process of doing a more thorough analysis of the data to increase our perimeter and in-depth security practices. Given that security is a key component of nearly every enterprise, we expect that our ROI will help many other companies understand how to better secure their environment.
Networks: complex, manual config, slow to provision new services.Data deluge: overwhelming (40% CAGR, up to 18 copies of same data) while users demands grow with big data analyticsComms: wireless edge inadequate to cope with new service delivery for mobile & IoT (e.g, latency)Network: Existing topologies increase difficulty in deploying new services; compute/storage have increased flexibility with virtualization and new storage approaches so network often barrier; network configuration requires manual set-up by skilled IT staff who could be doing more value added network security or other optimization workPer Intel IT provisioning network to support new application takes 2-3 weeks, mostly due to limited skilled staff who need to juggle multiple jobs (per Sridhar Mahankali)Storage: Data growth is a challenge to all datacenters; Users - from IT to the consumer – want to use data in new ways: quickly finding the needle in the haystack is the new expectation; Transferring this large volume of data for processing within a traditional complex, rigid storage architecture is costly and time consuming, drives the need for novel application/storage architectures. New application ‘mashups’ change expectations about how data is being used and accessed – and those changes need to happen quickly. IDC forecasts 40 Zettabytes of data stored by 2020 - over 5 GB for every person on earth and 57x times the # of grains of sand across all of the beaches on the earth. IDC estimates that 23% of the data storage is useful if it could be effectively analyzed, but currently less than 1% of data is actually analyzed leaving a lot of potential discoveries on the table. 33% of the data generated in 2013 will be from machines (and not people),but by 2020, 42% of the data will be generated by machines (per IDC’s Digital Universe Study, sponsored by EMC, December 2012)Per IBM: Firms average >18 copies of their data (disclosed at EDGE 2013 by Ed Walsh) SERVER% of physical x86 servers running hypervisor (vs. single OS) on bare metal – 51% in 1H’12, up from 38% in 2H’11Source: IDC Server Virtualization and The Cloud 2012 (End-User Survey Summary Report)The mean utilization for a virtualized server was 50%The mean utilization for a non-virtualized server was 43%…even in virtualized resource pools, utilization is typically below 40% due to: (1) Most virtualized resource pools are fairly static and don’t implement live migration of virtual machines (VMs). (2) Most data center workloads have very bursty demands thus leading to conservative consolidation decisions. (3) The performance of user-interactive applications drops significantly when they must contend for busy resources. The poor performance of these ‘critical’ applications at high utilization levels is the main challenge in increasing the utilization of servers in data centers. Simon Fraser University School of Computing Science “Maximizing Server Utilization while Meeting Critical SLAs “ http://www.cs.sfu.ca/~fedorova/papers/blagodurov-im2013.pdf
Key message: Software Defined Infrastructure (SDI) is the key to address these challenges across compute, storage, network and orchestration at the rack level to dramatically improve TCO. The infrastructure adapts to the workloads providing real time scaling. Note: Hit on the point that optimized & instrumented hardware plays an important role in software defined infrastructure. SDI dynamically orchestrates resources for different types of services and automatically optimizes for maximum resource utilization and power efficiency. This includes automation of provisioning, resource monitoringand workload placement for better utilization of data center resources. The underlying optimized hardware is critical to the orchestration that monitors various system attributes for efficient & flexible placement of the workloads. <Transition> Creating an optimized infrastructure requires an understanding of the workloads…
SDN point of contact for field: Rene TorresIn networking we’re following very closely in the footsteps of the server evolution When servers virtualized the network still had manual connects/command line interfaces – that’s why servers can deploy in minutes but networks take weeks. We are moving to a new world of sw defined networking which is implemented in phases. First the network is virtualized & you create network overlays that give tenants control of their own network with increases agility scale, etc. Next take network functions like load balance, firewall running on dedicated appliances and running the app in a VM on a standard server. This is great for customers since they can deploy services at the speed of software increasing their ability to earn $ and reducing capX. Good for Intel bc it moves Intel platforms into network functions across telco and datacenter.The third phase is a move to standardize on APIs and how SW interacts with networking devices. Some people talk about a separation of control & data planes. You have a central sw controller (SDN controller) that communicates over a standard based API such as Open flow that allows it to automate the control & management of networking devices. . That allows the controller running on a standard server to interact with the software running on top of the virtualization layer, to control the switches underneath and they can connect to anything in the data center – the network resources are pooled, agile and flexible. It is tops-down driven, configurable and extensible on a very rapid, agile basis, unlike when everything is controlled in a box. Network is configured and managed centrally ensuring consistency and minimizing user errors
Call for more information (9/16/13)ONP Switch/Ethernet/Alta/Red Rock Canyon: Steve SchultzIntel communications chipset: Frank Schapfel Intel Network Builders: Renu NavaleQuickAssist and DPDK: Frank Schapfel and Jim St. LegerSDN: Rene TorresIntel offers a number of building blocks to start the journey towards transformation of the network: 10GbE Adapters (X520/X54010/40 Switch Chips (Alta)ONP Switch Platform10/40 GbE Adapters coming in ‘14/’15 (Fortville)Red Rock Canyon (‘15 RSA Compute Fabric)SDN with Open Flow 1.3Intel Infrastructure Builders Software: Intel is offering the Quick Assist APIs to provide seamless interfaces to any implementation of a networking workload – software and HW accelerated. These interfaces sit on top of bothQuickAssist accelerated HW as well as SW libraries such as the dataplane development kit (DPDK)Dataplane development kit offers a set of software libraries to accelerate packet movement and IO on an Intel architecture processor
Intel Storage Innovation:1) Storage SoC’s – started with Briarwood (Atom S1200); we also have solutions for video surveillance, personal cloud, and personal media capabilities that include a security engine offload for encryption (very low end systems for home/SOHO use) – these are based on Berryville.2) Accelerators a. Platform (microarchitecture) extensions specifically for storage to deliver faster workloads that use less computing resources, or enable high availability (clustered storage solutions). Comparing E5-2600 to Xeon 5600 series: i. Up to 2x throughput improvementii. 80% performance boost for data deduplication workloadsiii. 3x higher I/O performanceb. Software accelerators that improve performance, for instance more than 3x performance improvement on hashing functions for compression or deduplication purposes.Next Gen NVM: Software:Lustre: Parallel distributed file system used in six of the top 10 and more than 60 of the top 100 supercomputers. Lustre file systems are scalable and can support multiple compute clusters with tens of thousands of client nodes, tens of petabytes of storage on hundreds of servers, and more than a terabyte per second (TB/s) of aggregate I/O throughput. This makes Lustre file systems a popular choice for businesses with large data centers, including those in industries such as meteorology, simulation, oil and gas, life science, rich media, and financeIntel® Enterprise Edition for Lustre software helps simplify configuration, monitoring, management and storage of high volumes of data to extend the reach of Lustre into new markets such as financial services, data analytics, pharmaceuticals, and oil and gas. When combined with the Intel® Distribution for Apache Hadoop* software, Hadoop users can access Lustre data files directly, saving time and resources. Cache acceleration software: Enables easy offload of I/O from primary HDD storage to fast SSD/Flash media to cache “hot” data, reducing I/O latency. CAS automatically integrates the cache on Intel SSDs and the server DRAM cache, creating a multi-level cache that optimizes the use of system memory and automatically determines the best cache level for active data—allowing applications to perform faster with Intel CAS than running fully on SSDs. Up to 3 times performance improvement on transactional database processing; Up to 20 times improvement on read-intensive business analytics
Call for more detail: (9/16/13)Rangley/SOC - Sudhir Raman Software Cache acceleration – Susan BobholzEnterprise Edition Lustre – Brent GordaEcosystem enabling – Renu NavaleNext gen SSD – Chuck BrownNext gen NVM – Ralph Biesemeyer/Lynn CompThe building blocks needed for the storage transformation include everything form the silicon – xeon and atom processors and the associated acceleration built into those processors (including acceleration for encryption with AES-NI, etc.). The products allow us to have solutions that meet the needs from the very high end to the very low end where higher latencies can be tolerated. We also have SSDs that are used in many storage solutions today and over time we’ll move to the next generation of NVM that will provide us a very interesting capability that will sit between traditional storage and memory.In addition to the silicon building blocks, there are many software elements that are going to be required for this transformation. Everything from our internal SW products like our cash acceleration SW that allows a storage system to use SSDs like a cache layer and our enterprise edition for lustre that enables a very manageable high performance computing storage solution. The ecosystem is also very important to our transformation to SDS. We are working with a number of industry partners everyone from MSFT and VMware and also Red Hat to much smaller players like Nexenta and CEPH providers. Intel is very active in openstack and we are providing a number of storage improvements into SWIFT and CINDER for object and block storage. Some of our contributions to openstack as well as our contributions to our other SW partners include parts of our Storage Acceleration Libraries. The highly optimized math algorithms in ISA-L allow our customers to greatly increase performance of their SDS soltuions.
Field note: There is a link to Intel CAS throughput performance data that is in the backup of this presentation.Field note: There is a link to a proof point for performance of an SSD on Oracle TimesTen using Intel SSDs. This is a useful whitepaper that shows how adding SSDs to a system configuration saves in both hardware acquisition and software license costs that pay many times over for the initial investment.There are a variety of new opportunities for solid state disk technologies in the enterprise, and this is enhanced by our new Intel CAS software.Intel Solid State drives come in a variety of form factors, and have enterprise-class levels of reliability along with capacities that are near those of fast rotating media. They can be used as a direct replacement for rotating media. For high-performance needs in the datacenter, Intel SSDs are a great solution that will likely pay for themselves in a short time. We have a pointer to an example that uses Oracle TimesTen if you’re interested in further information or examples.For some applications, adding the Intel Cache Acceleration Software (Intel CAS) solution enables an SSD to act as a local buffer for data on rotating media in the server. This enables you to add in a minimum of cost and get performance at near-SSD levels for all your data, which is a good hybrid solution for cost-conscious deployments. We can look at the performance data in backup if you’re interested.
NOTE to Jason: Factoids will come on slide 21. Use this slide to articulate the evolution and how Intel is leading with key CSPsGoal of this slide: put Intel in the game on the trend of disaggregation. Customers like Facebook and other big hyperscale cloud providers are asking for more modularity, flexibility and TCO savings that can be delivered from disaggregation. This slide sets up Intel as a leader in delivering technologies to enable disaggregation and the future end state of complete subsystem disaggregationAs we continue to optimize technologies for compute at scale, we must also continue to look at most optimal way to put things in the rack and the overall design of the rack to enable modularity and efficiency. We’ve seen the industry evolve in rack design in three phasesThe first step, most of today’s hyperscale rack deployments, have been a physical disaggregation where all non-critical sheet metal has been removed and key components such as power supplies have been consolidated.the next step in the evolution is to disaggregate and separate out the CPU/memory from the IO and storage subsystems. We are seeing distributed switches and shared IO managed through a top of rack switch, photonics interconnects for higher speeds and fewer cables, and centralized storage appliances for decreased failure rates and improved serviceability. The third step is complete subsystem modularity and separation to drive component level refresh and increase both utilization and ROI for each critical subsystem in the rack.
Key Message: Whatever the solution, Intel is actively working with partners to optimize solutions for analyzing the huge variety of data, providing new insight models, and delivering real-time or near real-time information services.Intel is at the Core of the Big Data across provisioning models and in understanding the right data methods for the right data structure. In the last 24 months there has been abundant innovation on the DB product market than at any time in the last 10 years. While locality and distribution of compute, storage and IO platforms many vary. Intel has been actively working to optimize its technology portfolio within relational, emerging technologies and in the Analytical Engines that are commercially available
Intel Cache Acceleration Software (Intel CAS) solution enables an SSD to act as a local buffer for data on rotating media in the server. This enables you to add in a minimum of cost and get performance at near-SSD levels for all your data, which is a good hybrid solution for cost-conscious deployments. Adding Intel CAS + SSDs as a cache layer accelerates Hadoop workloads by 1) mitigating the disk I/O bottleneck without complete HDD replacement 2) significantly reducing workload processing time vs. HDD 3) better CPU Utilization during the “map phase” Expressway API Manager with the Apache Hadoop* Distributions provides 1) Codeless insertion and retrieval to and from Intel Hbase using drag and drop 2) Expose ‘Big Data’ using a REST facade, ideal for native mobile applications 3) Provide a secure REST API with authentication and authorization based on OAuth and internal identity stores such as LDAP Luster: Starting with IDH 3.0, Intel provides three new components – a plugin for HDFS, a plugin for MapReduce, and a plugin in the IDH Manager, that together enable HPC customers to bring Hadoop to where their data already lives, on a Lustre file system. Typically, the performance of Hadoop is tied to the storage nodes and vice versa. With this combination, performance of compute and storage can scale independently.