IDC Perspectives on Big Data Outside of HPC

President of insideHPC Media at inside-BigData.com
Jul. 7, 2013
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
1 of 30

More Related Content

What's hot

Measuring HPC: Performance, Cost, & ValueMeasuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & Valueinside-BigData.com
HPC Market Update from Hyperion ResearchHPC Market Update from Hyperion Research
HPC Market Update from Hyperion Researchinside-BigData.com
High Performance Data Analysis (HPDA): HPC - Big Data ConvergenceHigh Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data Convergenceinside-BigData.com
Hot Technology Topics in 2017Hot Technology Topics in 2017
Hot Technology Topics in 2017inside-BigData.com
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
OrionX AI SurveyOrionX AI Survey
OrionX AI Surveyinside-BigData.com

Viewers also liked

2016 IDC HPC Market Update2016 IDC HPC Market Update
2016 IDC HPC Market Updateinside-BigData.com
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipesinside-BigData.com
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
HPC at HP UpdateHPC at HP Update
HPC at HP Updateinside-BigData.com
2016.10 HPDA in Precision Medicine2016.10 HPDA in Precision Medicine
2016.10 HPDA in Precision MedicineMichael Atkins
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo

Similar to IDC Perspectives on Big Data Outside of HPC

R180305120123R180305120123
R180305120123IOSR Journals
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big DataThe Wisdom Daily
Big DataBig Data
Big DataFaisal Ahmed
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway

More from inside-BigData.com

Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in ITinside-BigData.com
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com

More from inside-BigData.com(20)

Recently uploaded

Chandrayaan 3.pptxChandrayaan 3.pptx
Chandrayaan 3.pptxPrasunJha12
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIFeatureByte
Future of SkillsFuture of Skills
Future of SkillsAlison B. Lowndes
Common WordPress APIs_ Settings APICommon WordPress APIs_ Settings API
Common WordPress APIs_ Settings APIJonathan Bossenger
Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMeetup_adessoCamunda_2023-09-13_Part1&2_en.pdf
Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMariaAlcantara50
How resolve Gem dependencies in your code?How resolve Gem dependencies in your code?
How resolve Gem dependencies in your code?Hiroshi SHIBATA

IDC Perspectives on Big Data Outside of HPC

Editor's Notes

  1. Here’s a general definition of Big Data using the schema of the “four V’s” that’s become familiar. This isn’t specific to high performance data analysis. It applies to Big Data across all markets.To qualify as Big Data in this general context, the data set has to be large in volume, critical to analyze in a timeframe...It has to include multiple types of data and it has to be worthwhile to someone, preferably with a monetary value.
  2. The emerging market for high performance data analysis is narrower than that. As I said a minute ago, it’s the market being formed by the convergence of data-intensive simulation and data-intensive analytical methods, so it’s really a union set. As the slide shows, this evolving market is very inclusive in relation to methods, types of data, and market sectors. The common denominator across these segments is the use of models that incorporate algorithmic complexity. You typically don’t find that kind of algorithmic complexity in online transaction processing or in commercial applications such as supply chain management and customer relationship management.The ultimate criterion for HPDA that it requires HPC resources.
  3. There are important HPDA market drivers on the data ingestion side and the data output side. Data sources have become much more powerful. CERN’s Large Hadron Collider generates 1PB/second when it’s running. The Square Kilometer Array telescope will produce 1EB/day when it becomes operational in 2016. - But those are extreme examples. Much more common are sensor networks for power grids and other things, gene sequencers, MRI machines, and so on.- Onllne sales transactions produce a lot of data and a lot of opportunity for fraud. Standards, regulations and lawsuits are on the rise. Boeing stores all its engineering data for the 30-year lifetime of their commercial airplanes, not just as a reference for designing future planes but in case there’s a crash and a lawsuit. On the output side, more powerful HPC systems are kicking out lots more data in response to the growing user requirements you see listed here.
  4. Moving data costs time and money. Energy has become very expensive. It can take 100 times more energy to move the results of a calculation than to perform the calculation in the first place. It’s no wonder that oil and gas companies, for example, still rely heavily on courier services for overnight shipping of disk drives. It would take too long and cost too much to send the data over a computer network.- If you’re a vendor, you have two main strategies available to you: you can speed up data movement , mainly through better interconnects, or you can minimize data movement by pre-filtering data or bringing the compute to the data, or you can both accelerate and minimize.
  5. The data in most HPDA jobs assigned to HPC resources will continue to have regular access patterns, whether the data is structured or unstructured.This means it can be partitioned and mapped onto a standard cluster or other distributed memory machine for running Hadoop or other software.But there’s a rising tide of data work that exhibits irregular access patterns and can’t take advantage of data locality processing features. Caches are highly inefficient for jobs like this. These jobs benefit from global memory combined with powerful interconnects and other data movement capabilities. Partitionable jobs are very important now and non-partitionable jobs are becoming more important. By the way, SGI systems address both types. One general remark is that as the data analysis side of HPC expands, HPC architectures will need to become less compute-centric and offer more support for data integration and analysis.“Many current approaches to big data have been about ‘search’ – the ability to efficiently find something that you know is there in your data,” said Arvind Parthasarathi, President of YarcData. “uRiKA was purposely built to solve the problem of ‘discovery’ in big data – to discover things, relationships or patterns that you don’t know exist. By giving organizations the ability to do much faster hypothesis validation at scale and in real time, we are enabling the solution of business problems that were previously difficult or impossible – whether it be discovering the ideal patient treatment, investigating fraud, detecting threats, finding new trading algorithms or identifying counter-party risk. Basically, we are systematizing serendipity.”
  6. HPC servers are often used for more than one purpose. IDC classifies HPC servers according to the primary purpose they’re used for. So, an HPDA server is one that’s used more than 50% for HPDA work. As this table shows, IDC forecasts that revenue for HPC servers acquired primarily for HPDA use will grow robustly (10.4% CAGR) to approach $1 billion in 2015. Because HPDA revenue starts as such a relatively small chunk of overall HPC server revenue, the HPDA share of the overall HPC server revenue will still be in the single digits in 2015, despite the fast growth rate.
  7. Let’s look at some real-world use cases
  8. This slide lists some of the most prominent use cases, meaning ones where repeated sales of HPC products have been happening. Fraud detection and life sciences are emerging fastest. BTW, I didn’t include financial services here because we’ve been tracking back-office FSI analytics as part of the HPC market for more than 20 years. But FSI is an important part of the high performance data analysis market. – not an easy one to penetrate for the first time.
  9. I want to zero in more on the PayPal example because they gave me permission to use these slides and because in many ways they are representative of a larger group of commercial companies whose business requirements are pushing them up into HPC. The slides are from a talk PayPal gave IDC’s September 2012 HPC User Forum meeting in Dearborn, Michigan. By the way, if you want a copy of this talk or any of the long list of talks on one of our first slides, just email me at sconway [at] idc.com
  10. PayPal is an eBay subsidiary and, among other things, has responsibility for detecting fraud across eBay and SKYPE. Five years ago, a day's worth of data was processed in batch processing overnight and fraud wasn't detected until as much as two weeks later. They realized they needed to detect fraud in real time, and for that they needed graph analysis. They were most interested in checking out collusion between multiple parties, such as when a credit card shows activity from four or more users. They needed to be able to stop that before the credit card got hit. IBM Watson on the Jeopardy game show was amazing but it was a needle in a haystack problem, meaning that Watson could only find answers that were already in its database. PayPal’s problem was different, because there was no visible needle to be found. Graph analysis let them uncover hidden relationships and behavior patterns
  11. This gives you an idea of PayPal’s data volumes and HPDA requirements. These are going up all the time.
  12. Here’s what PayPal is using. For the serious fraud detection and analysis, they’re using SGI servers and storage on an InfiniBand network. For the less-challenging work that doesn’t involve pattern discovery and real-time requirements, they’re running Hadoop on a cluster. By the way, PayPal says HPC has already saved them $710 million in fraud they wouldn’t have been able to detect before.
  13. This gives you an idea of PayPal’s data volumes and HPDA requirements. These are going up all the time.
  14. For cost and growth reasons, GEICO moved to automated insurance quotes on the phone. They needed to provide quotes instantaneously, in 100 milliseconds or less. They couldn’t do these calculations nearly fast enough on the fly .GEICO’s solution was to install an HPC system and every weekend run updated quotes for every adult and every household in the United States. That takes 60 wall clock hours today. The phones tap into the stored quotes and return the correct one in 100 milliseconds.
  15. Here’s a real-world example of one of the biggest names in global package delivery. Their problem is not so different from PayPal’s. This courier service is doing real-time fraud detection on huge volumes of packages that come into their sorting facility from many locations and leave the facility for many other locations around the world.They ran a difficult benchmark. The winner hasn’t been publicly announced yet, but IDC’s back channels tell us the vendor has a 3-letter name that starts with S.
  16. Schrödinger is a global life sciences software company with offices in Munich and Mannheim. One of the major things they do is use molecular dynamics to identify promising candidates for new drugs to combat cancer and other diseases – and it seems they’ve been using the cloud for this High Performance Data Analysis problem. That’s not so surprising, since molecular dynamics codes are often highly parallel.
  17. Here’s the architecture they used. Note that they were already using HPC in their on premise data center, but the resources weren’t big enough for this task. That’s why they bursted out to Amazon EC2 using a software management layer from Cycle Computing to access more than 50,000 additional cores. Bringing a new drug to market can cost as much as £10 billion and a decade of time, so security is a major concern with commercial drug discovery. Apparently, Schrödinger felt confident about the cloud security measures.
  18. You may have seen the recent news that Optum, which is part of United Health Group, is teaming with the Mayo Cline to build a huge center in Cambridge, Massachusetts to lay the research groundwork for outcomes-based medicine. They’ll have more than 100 million patient records at their disposal for this enormous data-intensive work.They’ll be using data-intensive methods to look at other aspects of health care, too. A week ago, United Health issued a press release in which they said they believe that improved efficiencies alone could reduce Medicare costs by about 40%, obviating much of the need for the major reforms the political parties have been fighting about.
  19. In the U.S., the largest urban gangs are the Crips and the Bloods. They’re rival gangs that are at each other’s throats all the time, fighting for money and power. Both gangs are national in scope, but the national organizations aren’t that strong. The branches of these gangs in each city have a lot of autonomy to do what they want.What you see here, again in blurred form, was something that astounded the police department of Atlanta, Georgia, a city with about 4 million inhabitants. Through real-time monitory of social networks, they were able to witness, as it happened, the planned merger of these rival gangs in their city. This information allowed the police to adapt their own plans accordingly.
  20. In summary, we defined HPDA and told you that IDC is forecasting rapid growth from a small base.HPDA is about the convergence of data-intensive HPC and high-end commercial analytics. One of the most interesting aspects of this, to us, is that the demands of the commercial market are moving this along faster in the commercial sector than in the traditional HPC market. PayPal is a great example of this (story of how PayPal was shy about presenting at User Forum – both sides should be learning from each other). On the analytics side, some attractive use cases are already out there. In the time allotted to us here, we described some of the more prominent ones, but there are many others.Most of the work will be done on clusters, but some economically important use cases need more capable architectures, especially for graph analytics.Many of the large investment firms are IDC clients, so our growth estimates tend to err on the side of conservatism. There is potential for the HPDA market to grow faster than our current forecast. But we talk with a lot of people and we update the forecasts often, so we get too far off the mark.
  21. This is a partial list of the user and vendor talks on this topic that we’ve lined up in the past two years as part of the HPC User Forum. IDC has operated the HPC User Forum since 1999 for a volunteer steering committee made up of senior HPC people from government, industry and academia – organizations like Boeing, GM, Ford, NSF and others. We hold meetings across the world, and the talks listed here include perspectives on High Performance Data Analysis from the Americas, Europe and Asia.I’ll ask Chirag to explain how we define High Performance Data Analysis. I’ll return later to walk you through some real-world use cases. Chirag...