Avoid Re-Inventing the Wheel When
Seeking Big Data Bliss
April 9th, 2014
Michael Coté
Research Director,
Infrastructure Software
cote@451research.co
m
@cote – http://cote.io
Responsible for syste...
Usually, “Big Data” us a synonym for “Hadoop:” not so fast
Processing and analysis of very
large data sets in their entire...
 Another example: a provider of real-time information and analysis to the
media and communications industries
• Moved fro...
‘Big data’ not significant in core infrastructure yet
Average total storage capacity (TBs), and total storage footprint by...
Hadoop vs. EDW – not so much
13.30%
31.60%
10.20%
37.80%
40.80%
Hadoop replacing data warehouse
Permanently migrating work...
What’s big data good for?
• The processing and analysis of very large data sets in their entirety
• Increased adoption of ...
How to think strategically about big data
‘Big Data’ is the realization of competitive advantage
by storing, processing an...
Zeroing in on Hadoop - barriers to Hadoop adoption
 Hadoop is complex to configure, deploy
and manage
 Skilled staff are...
Your homework…
1. What business problem are you
solving? What questions will you ask
The Data?
2. Baseline existing costs,...
Thanks!
@cote – cote@451research.com - http://cote.io
Upcoming SlideShare
Loading in …5
×

Strategic Planning for Big Data - Avoid Re-Inventing the Wheel When Seeking Big Data Bliss

1,176 views
979 views

Published on

These are the slides I used in a webinar I did with BMC on April 9th, 2014. You can see all of the slides and watch the recorded presentation over at BrightTalk: https://www.brighttalk.com/webcast/9059/103135

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,176
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Let’s start from the technologies, and build up to the business reasons.Usually, “Big Data” is a synonym for “Hadoop.” But it’s larger.In our work studying this field, we like to think of a larger aperture, Total Data.The cliché three Vs are at center, with the ability to keep more data than ever, and actually do something with it.But, you must remember that there are many other moving parts in the circles of big data, not to be confused with Dante’s circles of hell ;)This all amount of “Total Data” – looking at how all your data-driven processes fit together…Wrapped around all of this is the domain knowledge that allows you know which questions to ask, and then the business case and process that figures out how to profit from those answers.
  • The excitement around Big Data comes, we believe, from the economics of it. Sure, the technology is powerful and affords people the ability to keep more data than ever and analyize it. But this is driven by the dramatic cost reductions possible. For the enterprise use of IT, cost is often the limiting factor, after all.So, with Big Data, we seem to have analytical abilities that were previously within the reach of only spies and scientists (with supercomputers).Everyone must be using it, right?
  • Just like you, us analysts we hear about big data all the time. If I hear about “data explosion” one more time I might shoot myself, or at least go live in a log cabin in the woods.When we look at one of the tracers for big data use, consumption of storage for big data vs. other storage uses in the enterprise, you see that it’s actually small.So while we hear about a lot potential, we’re clearly early on in the deployment of big data into the mainstream, which is both a downer and encouraging.It’s a downer because you’d like hype to match reality.It’s encouraging because it means that if you’re not doing it, you’re not behind, and seem wise. You have time to figure it out and sort through the best way forward.
  • So what is Hadoop good for?You shouldn’t think of it as a zero-sum game with existing BI. From what we can tell, it’s not replacing Enterprise Data Warehouses across the board, but is instead additive.Hence our circles of Total Data that look at the whole picture.
  • Cases:- Reducing the cost of storing data, as shown earlier – company reduced data storage budget by $4m/year.Video camera analysis to track in-store people movement.Dynamic pricing based on a customer’s demographics from social networks, or other sources – imagine shelf prices updating in real-time (http://technologyadvice.com/business-intelligence/blog/smart-shelves-will-identify-grocery-shoppers-deliver-custom-advertisements/)Looking at optimizing POs and other paperworkNYTimes converting archives to PDFs
  • Like most off-the-web projects, the ease of use depends on your own sophistication and the time you have to futz with it.Hadoop is clearly a de factor standard, but it’s still difficult and skills are in high-demand.Unless you’re in a green field situation, focus on how it can be additive to what you have.Thankfully, there’s lots of options out there for help: from vendors, consulting, books, blogs, etc.
  • What are the next steps? Put broadly, here’s 3 things you can do, and a bonus item.
  • Strategic Planning for Big Data - Avoid Re-Inventing the Wheel When Seeking Big Data Bliss

    1. 1. Avoid Re-Inventing the Wheel When Seeking Big Data Bliss April 9th, 2014
    2. 2. Michael Coté Research Director, Infrastructure Software cote@451research.co m @cote – http://cote.io Responsible for systems management, application development, cloud software, and misc. “infrastructure software” agenda Worked at Dell in corporate strategy, as an analyst for 6+ years, software developer for 10+ years Joe Goldberg BMC Control-M Solutions Marketing joe_goldberg@bmc.com @GoldbergJoe Joe is an IT professional with over 35 years of experience in the design, development, implementation, sales and marketing of enterprise solutions to Global 2000 organizations. Joe has been active in helping BMC products leverage new technology to deliver market-leading solutions BMC slides were omitted from this presentation. See full presentation and recording here: https://www.brighttalk.com/webcast/9059/103135
    3. 3. Usually, “Big Data” us a synonym for “Hadoop:” not so fast Processing and analysis of very large data sets in their entirety Massively parallel processing approaches Both structured and multi-structured data External (social) and corporate data Schema-free and schema-on- read data storage/analysis Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data Stream processing of sensor and machine-generated data Native, SQL-based analysis of data in Hadoop and HBase In-memory databases for rapid data ingestion Real-time analysis of data prior to storage TOTAL DATA Management alongside existing data technologies Source: “Big data reconsidered: it's the economics, stupid,” 451 Research, Dec 2013.
    4. 4.  Another example: a provider of real-time information and analysis to the media and communications industries • Moved from storing 1% of data for 60 days in EDW @ $100,000/TB • To 100% of data for a year in Hadoop @ $900/TB • By migrating to Hadoop and open source databases the company identified over $4m in cost savings over two years  Both companies have retained the use of traditional databases/warehousing, but Hadoop and other big data technologies add cost-effectiveness and flexibility Big Data: “it’s the economics stupid”  “The price point that Hadoop comes in at is transformational. Hadoop has the ability to drive down operational cost and improve resource efficiency.”  Global Head of Architecture, Global Bank
    5. 5. ‘Big data’ not significant in core infrastructure yet Average total storage capacity (TBs), and total storage footprint by workload illustrate the low level of adoption at today Source: 2012: 451 Research The Info Pro Storage – Wave 16 | n=214 2013: 451 Research The Info Pro Storage – Wave 17 | n=200 0 2000 4000 6000 8000 2013 2012DW and DBMS Unstructured file Virtualized server/OS Backup Archive Other Big data/Hadoop 3% 3%
    6. 6. Hadoop vs. EDW – not so much 13.30% 31.60% 10.20% 37.80% 40.80% Hadoop replacing data warehouse Permanently migrating workloads to Hadoop Temporaily offloading workloads to Hadoop Hadoop for workloads not previously on DW Hadoop not used Describe the relationship between Hadoop and the data warehouse within your organization Non-threatening, or additive Threatening Source: "Hadoop: a framework in search of a metaphor," 451 Survey conducted Sep/Oct 2013, sample=98.
    7. 7. What’s big data good for? • The processing and analysis of very large data sets in their entirety • Increased adoption of massively parallel processing approaches • Storage and analysis of both structured and un-structured data • Integration of external (social) and corporate data for more complete perspective • Ad hoc analytic approaches to identify new patterns in data • Interactive, native, SQL-based analysis of data in Hadoop and Hbase. • Predictive analytics as a fundamental component of BI strategies • Machine-learning algorithms automate the reflection of collective intelligence • Increased adoption of in-memory databases for rapid data ingestion • Stream processing of sensor and other machine-generated data/events • Real-time analysis of data prior to storage within the data warehouse/Hadoop • “MR-ETL” – pre-processing data for EDW loads Source: “Big data reconsidered: it's the economics, stupid,” 451 Research, Dec 2013.
    8. 8. How to think strategically about big data ‘Big Data’ is the realization of competitive advantage by storing, processing and analyzing data that was previously ignored due to the cost and functional limitations of traditional data management technologies to handle its volume, velocity and variety
    9. 9. Zeroing in on Hadoop - barriers to Hadoop adoption  Hadoop is complex to configure, deploy and manage  Skilled staff are at a premium  Enterprises want to make the most of existing tools/skills  Enterprises are still trying to understand where Hadoop fits in their data management landscape
    10. 10. Your homework… 1. What business problem are you solving? What questions will you ask The Data? 2. Baseline existing costs, monitor new costs – did you save? 3. Monitoring and managing your new grid 4. Bonus: self-service access for ad hoc analysts
    11. 11. Thanks! @cote – cote@451research.com - http://cote.io

    ×