• Like

Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterprise Rock Stars - Martin Hall, Karmasphere

  • 2,645 views
Uploaded on

This presentation will explore how Hadoop and Big Data are re-inventing enterprise workflows, and the pivotal role of the Data Analyst. It will examine the changing face of analytics and the …

This presentation will explore how Hadoop and Big Data are re-inventing enterprise workflows, and the pivotal role of the Data Analyst. It will examine the changing face of analytics and the streamlining of iterative queries through evolved user interfaces. The speaker will cut through hype around “shorter time to insight” and explain how combining Hadoop and SQL-based analytics help companies discover emergent trends hidden in unstructured data, without having to retrain data miners or restaff. In particular, it will highlight changes to Big Data analysis from this paradigm and illustrate stepwise how analysts can now connect to Big Data platforms, assemble working data sets from disparate sources, analyze and mine that data for actionable insight, publish the results as visualizations and for feeding reporting tools, and operationalize Map-Reduce and Big Data outcomes into company workflows – all without touching the command line.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,645
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
69
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • I’m going to talk about things we’re seeingHow forces of change are impacting how IT operates, the growing role of data and how data professionals are moving front and center to play major roles in empowering businesses
  • I’m going to cover 4 main themesFirst I’ll talk about the traditional requirements-driven businessThen I’ll summarize what we see as the major factors driving change and opportunities for us allThe bulk of the presentation is about how all of us are becoming agents of change and how to meet the needs of our roles in this new world of Hadoop-enabled Big Data.
  • So let’s take a look at the traditional business and, in particular, how it deals with data….The result of all this is that business insight is limited – in scope and time.
  • The forces of change are not just about technology …
  • From working with thousands of users, customers and partners, we’re seeing a blue print emerge for the new data centric business. It’s about enablement. In particular, it’s about the utilization of all a business’s data and enablement of data professionals and analysts.And useful data is not just limited to what a business currently owns. Data marketplaces, aggregators and specailist providers across many industries are opening up their data, providing APIs and creating the promise of even more business and market-relevant insight.
  • Listening to yesterday’s keynotes, much of what Larry Feinsmith said rings true and is aligned with what we see and hear, not just from financial services but across multiple verticals.Successful IT organizations are enabling data professionals to be self-service. Whether it’s through on-premise or, increasingly, on-demand in-the-cloud services, this has to be the mantra of the successful future business.
  • And data professionals are at the heart of this change. They can choose to be concrete or catalyst.We are all innovators. Innovation entails risk and many of us sometimes baulk at it. But our roles in this new industry of data are growing, our potential to impact our businesses is growing and because of the value we bring and simple supply/demand economics, we will get paid more.
  • I’ve been at all 3 Hadoop World events. It’s interesting to reflect on how people thought about Hadoop two and even one year ago.
  • But we’re all becoming and need to be more sophisticated. Installing Hadoop is only step 1. A devops team bringing up a CDH cluster or an inspired developer firing up a Hadoop cluster using Elastic MapReduce, is just the beginning. Businesses are getting smarter about understanding the potential of Hadoop but also about how to plan for success and what a successful Hadoop-based stack looks like. And about who and how they enable skilled workers to access that stack.
  • What we see are 3 key classes of data professionalsIT is clearly key to the infrastructure. But choosing your Hadoop provider and determining whether you’re going on-premise, in-cloud or with a hybrid strategy is just step 1.Businesses are getting smarter
  • More sophisticated thinking now takes into account the democratization of access. Here’s this is the common fabric we’re seeing.
  • Hadoop open source projects also fall into these categories with the data management projects focused on innovation of the core platform and the analytics projects creating the core technology for analysis.
  • Data engineers often implement existing algorithms in MapReduce or take the insights created by data analysts.They also build distributed functions that the analysts can use.
  • So let’s take a look at what we’ve found about what the data analysts and engineers need. It’s not just about the command line any more. As you grow the teams of professionals accessing Hadoop, it’s no longer enough to give them a command line, SSH or rudimentary web interface. People have skills and skills flourish faster in high productivity environments
  • So what does a workflow optimized for big data look like?We think it needs to provide 4 key workflow stages.It has to enable you to connect to any Hadoop cluster, no matter where it is located and which company or organization it comes fromIt needs to provide easy access to data so you can point, click and automatically understand the data as it’s prepared for analysisMost importantly, it needs to provide an easy-to-use environment for iterative analysis with abstraction and visualization capabilitiesFinally, it needs to provide the ability to act on any and every insight generated. I’ll walk you through all of these…

Transcript

  • 1. Data Professionals: The New Enterprise Rock Stars www.karmasphere.com1 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 2. IT Transition100100 10111 100010110 1101101110101 0 10 01101010110 010 1010 11010 010010100 101000 0111100 001010 101 01101 1010 1110 01010 011010 101011 1001 100 01010 01101 01000110011 0101 10010 11011 000101 101 11010 001 0101 10101010 1101 1101011011 0 011101101 1011 0101011 10101101 1010 100010 001010 0001 010 01101 0111 101 011 01000010110 101110 1011 10101 000 0010 101010 1010111 1011 01000011 11010101 0111101000100 01111 01111 011110 01001 001010 11 100100 10111 100010110 1101101110101 0 10 Business01101010110 010 1010 11010 010010100 + Agents Forces 101 000 0111100 001010 101 01101 1010 1110 Business Requirements01101 11010 1010 011010 101011 1001 change 01101 010 00110011 0101 10010 11011 of 100 01010000101 101 11010 001 0101 10101010 1101 1101011011 0 011 101101 1011 0101011 Empowerment10101101 1010 011 100010 001010 0001 010 01101 0111 101 011 01000 010110 101110 101110101 000 0010 101010 1010111 1011 01000011 11010101 01111010 00100 01111 01111011110 01001 0110 100100 10111 100010110 1101101110101 0 10 01101010110 010 101011010 010010100 101 000 0111100 001010 101 01101 1010 1110 01010 011010 101011 1001100 01010 01101 010 00110011 0101 10010 11011 000101 101 11010 001 0101 101010101101 1101011011 0 011 101101 1011 0101011 10101101 1010 100010 001010 0001 010 011012 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 3. Overview •The Requirements-Driven Business •The Forces of Change •The Empowerment-Driven Business •Data Professionals: The Agents of Change • Who are they? • What do they need?3 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 4. The Requirements-Driven Business • IT driven by business requirements • Slow • Rigid • Subset analysis • Data access is limited • Costly • Innovation is hard Result Fixed, limited and backward-looking insights4 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 5. The Forces of Change Data is Competitive Advantage Open Source Data Volume & Economics Variety Limits of Existing Technology5 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 6. The Empowerment-Driven Business • IT empowers the business • Fosters innovation • Creates self-service environment • Steps aside • Fast • Flexible • Analysis of all data • Data access is democratized • Affordable • Innovation facilitated Result Flexible, unlimited and forward-looking insights6 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 7. “We have 250,000 peoplewho don’t want to rely on IT.They want to be self-service” Larry Feinsmith JP Morgan
  • 8. Data Professionals: The Agents of Change • Key to the data-centric, empowerment-driven business • Catalysts for business insight and change • Roles have greater responsibility and business impact • Will get paid more!8 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 9. IT Data IT Hadoop9 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 10. “It’s not enough to have aplatform that only Javadevelopers can use” Mike Olson, Cloudera
  • 11. Data Professionals Data Analysts Data Engineers Business Operations Data IT Hadoop11 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 12. Hadoop and Big Data Analytics in the Data Fabric12 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 13. Open Source Apache Hadoop SQL, Data Flow Languages, Predictive Analytics MapReduce, HDFS, Serialization, Coordination, Scalable database …13 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 14. Big Data Professional Roles Who What Access all the data they need on one or more clusters Data Analyst Work easily with structured and unstructured data Generate, share and integrate insights with the business Create fully optimized data processing jobs – Data Engineer transformations, filtering etc Build distributed M/R algorithms for use by Data Analysts IT Data Choose, install, manage, provision and scale Hadoop clusters Management14 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 15. How to Empower Data Analysts & Engineers • Mine large volumes of data • Purpose-built workspaces daily • Native support for Hadoop • Look for trends and patterns • Tight integration with on- that may not be picked up with premise and in-cloud structured data or tools Hadoop infrastructure • Determine how those trends • Wizard-driven workflows and patterns can help predict • Familiar, powerful and high the business future productivity paradigms and • Use discoveries to languages • Create new products/services • Open source compatibility • Optimize operations • Crystallize customer view15 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 16. The Big Data Analytics Workflow100100 10111 100010110 1101101110101 0 10 01101010110 010 1010 11010 010010100 101000 0111100 001010 101 01101 1010 1110 01010 011010 101011 1001 100 01010 01101 01000110011 0101 10010 11011 000101 101 11010 001 0101 10101010 1101 1101011011 0 011101101 1011 0101011 10101101 1010 100010 001010 0001 010 01101 0111 101 011 01000010110 101110 1011 10101 000 0010 101010 1010111 1011 01000011 11010101 0111101000100 01111 01111 011110 01001 001010 11 100100 10111 100010110 1101101110101 0 1001101010110 010 1010 11010 010010100 101 000 0111100 001010 101 01101 1010 111001101 11010 1010 011010 101011 1001 100 01010 01101 010 00110011 0101 10010 11011000101 101 11010 001 0101 10101010 1101 1101011011 0 011 101101 1011 010101110101101 1010 011 100010 001010 0001 010 01101 0111 101 011 01000 010110 101110 101110101 000 0010 101010 1010111 1011 01000011 11010101 01111010 00100 01111 01111011110 01001 0110 100100 10111 100010110 1101101110101 0 10 01101010110 010 101011010 010010100 101 000 0111100 001010 101 01101 1010 1110 01010 011010 101011 1001100 01010 01101 010 00110011 0101 10010 11011 000101 101 11010 001 0101 101010101101 1101011011 0 011 101101 1011 0101011 10101101 1010 100010 001010 0001 010 0110116 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 17. Step One: Access View Clusters and Navigate Connect easily Data Graphically to any Hadoop cluster • Create and Share Connections • Browse Data Graphically – HDFS, Local, Networked and Cloud File Systems Accesses Any and Every Hadoop Cluster • On-premise • In-cloud • Behind firewalls17 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 18. Step Two: Assemble Explore, integrate and Get wizard-driven, automaticorganize data of any type— data parsing on-the-fly—to prepare for • LZO, GZip and bzip files analysis • JSON, RCFiles, Text, Extended Text, Sequence, Binary, and custom types • Tab, comma semicolon, comma, space... Assemble Works with standard Hadoop Prepare data metastore for analysis18 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 19. Step Three: Analyze Write powerful queries easily Easily navigate Syntax Checking& Auto completefull data sets; find patterns, 180 predefined functions or add trends and insights custom Get assistance and visualization Formatting Filtering Sorting Analyze Charting Explore, query and Learn and iterate quickly visualize data Easily save and reuse queries Save and share results19 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 20. Step Four: Act Share Share, integrate and Save results and queries to operationalize results, Hadoop and JDBC databases Integrate queries and visualizations Use results with existing products including Excel, Tableau and BI products Operationalize Save results to operational Analyze data stores including legacy Explore, query and and Big Data stores visualize data20 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 21. For The Data Analyst • Graphical workspace for big data analysis of any type and size • Integrated Big Data Analytics workflow • Familiar and powerful user interface • SQL-based • Integrated with on-premise and in- cloud Hadoop • 100% Apache Hive compatible21 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 22. Karmasphere Analyst - Access Rich to create Screen shot22 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 23. Karmasphere Analyst – Assemble23 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 24. Karmasphere Analyst - Interact and Analyze24 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 25. Karmasphere Analyst – Visualize and Act25 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 26. For The Data Engineer • Guided MapReduce development, with wizardsand workflow • Prototype on the desktop, debug on the cluster • Profile and optimize Hadoop jobs with graphical monitoring • Package and export jobs for external submissionto a cluster Rich to Get Screen Shot of Studio Workflow26 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 27. If You’re Interested in Karmasphere • Download free versions and virtual appliances, including Cloudera CDH + Karmasphere • http://www.karmasphere.com • Get Pay-As-You-Go Hadoop analytics software from • http://aws.amazon.com/elasticmapreduce27 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.
  • 28. Thank You martinh@karmasphere.com www.karmasphere.com28 © Karmasphere 2011 All rights reserved. Karmasphere Proprietary and Confidential.