© 2006 IBM Corporation




                          Enabling ad-hoc
                           Analytic Apps
            ...
Hadoop World ’09



 Emerging Technology - What do we work on?




      Making Hadoop
       accessible to
         busin...
Hadoop World ’09


 New Intelligence - Big Data

     Nearly 15 petabytes of data are created
     every day — eight times...
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?

     Internet Evolution: A web of data
     ...
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?

     Internet Evolution: A web of data
     ...
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?


   Hear business users asking for
   the ab...
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?


   Hear business users asking for
   the ab...
Hadoop World ’09


 Let!s Talk Customer Scenarios - BBC


                                                          Busine...
Hadoop World ’09


 Let!s Talk Customers Scenarios - Thomson Reuters
                                                     ...
Hadoop World ’09


 IBM Emerging Technology Project: M2

                     What is it?
                     An insight ...
Hadoop World ’09


 M2 -> Demo
                                                                Business Questions
        ...
Hadoop World ’09


 What!s Under the Covers: Hadoop


  Emergence of map/reduce programming
  model for a new class of web...
Hadoop World ’09



 IBM Emerging Technology Project: M2 Architectural Components


      Expanding upon the Hadoop stack
...
Hadoop World ’09


 Conclusions




                   In God we trust



October 2009                                 SWG...
Hadoop World ’09


 Conclusions




    …all others bring data



October 2009                                 SWG Emergin...
Hadoop World ’09


 Conclusions


         Enterprises quickly evolving their thinking
         from a Database strategy t...
Hadoop World ’09


 Almost The End


Selecting customer proof
  of concept projects


               INTERESTED?
         ...
Upcoming SlideShare
Loading in...5
×

Hw09 Enabling Ad Hoc Analytics At Web Scale

3,378

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,378
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
154
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hw09 Enabling Ad Hoc Analytics At Web Scale

  1. 1. © 2006 IBM Corporation Enabling ad-hoc Analytic Apps Text with Hadoop rod smith (rod.smith@us.ibm.com) Friday, October 2, 2009
  2. 2. Hadoop World ’09 Emerging Technology - What do we work on? Making Hadoop accessible to business professionals October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  3. 3. Hadoop World ’09 New Intelligence - Big Data Nearly 15 petabytes of data are created every day — eight times more than the information in all the libraries in the U.S, Volume of data in enterprises is doubling approximately every 3 years (Forrester Research) • Includes structured and unstructured data, excludes rich media Costs to find, collect & analyze data is decreasing significantly as web innovation proceeds Content is untapped value for business insights & intelligence October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  4. 4. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together Extract (Tom Coates/Yahoo™ ) Gather Explore Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  5. 5. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together (Tom Coates/Yahoo™ ) Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  6. 6. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Leveraging easy-to-use, rich data manipulation metaphors like spreadsheets, etc.. Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  7. 7. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with Rich me in the drivers seat…” Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  8. 8. Hadoop World ’09 Let!s Talk Customer Scenarios - BBC Business Questions • Name names: Who is doing what, who isn!t doing what • Overlay voting record with demographic & voting records over time • Buzz - what are people talking about? BBC Digital • Visualize content relationships Democracy Project Achieving Increased Knowledge of Interest: • Members of Parliament (MPs) Government Transparency • Bills, Debates, Voting Districts Web Content To Gather: • UK Parliament Web Site • Timeframe: 10 + years October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  9. 9. Hadoop World ’09 Let!s Talk Customers Scenarios - Thomson Reuters Business Questions • NewsBuzz: What are the headlines? What are not the headlines but still infocus? • OpinionMonitor: Who is saying what? What are the debate topics? • NewsTimeline: Chronology (pulse) of headline news? Enrich Trader!s Desktop • TopicCloud: Tag based topic metrix Enhancement • IssueAnalytics: Link backs to semantically Timely aggregation & analytics of content related news originating from public internet sites Scenario • Gather unstructured data from anywhere between 200 to Knowledge of Interest: 2000 data sources - every 15 minutes • People, places, events • Perform preprocessing (search, transform, index) over each source • Publish harvested content for distributed content services and downstream Mashups Web Content To Gather: • ~118 3rd Party Finanical News Services and Blogs, including: BBC, CNN ,Yahoo News, Financial Times, NY Times, The Big Picture, Fox News, PR Newswire, Market Watch, World Press, Forbes, Google News, Wall Street , Journal, MSNBC, The Sun, ZDNet, October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  10. 10. Hadoop World ’09 IBM Emerging Technology Project: M2 What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point M2 to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the data What!s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  11. 11. Hadoop World ’09 M2 -> Demo Business Questions • How much is a target company worth? • What are the high-value areas of their portfolio? • Explored cited patent topics, litigated patents Knowledge of Interest: Project: • Patents ranked by citation – e.g how often Improve IP Portfolio Analysis was a patent referenced determines value for Mergers & Acquisitions • Corporate genealogies IP ownership roll-up • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • Gathered 1.4m patent docs from USPTO • 1991-2007 case records from Court of Appeals United States Federal Circuit (CAFC) October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  12. 12. Hadoop World ’09 What!s Under the Covers: Hadoop Emergence of map/reduce programming model for a new class of webApp Hadoop: provides a framework for large scale parallel processing map/reduce apps (Apache projects lead by Yahoo) • Offers simplicity of “programming” - Looks like a simple single threaded app model for developers • Handles big data - scalable storage across machine clusters (think read-only file system) • Deployment: no application knowledge of runtime or OS or cloud necessary • Today - setting up, coding Hadoop jobs in Java, etc. is the domain of skilled Java engineers October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  13. 13. Hadoop World ’09 IBM Emerging Technology Project: M2 Architectural Components Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig M2 Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  14. 14. Hadoop World ’09 Conclusions In God we trust October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  15. 15. Hadoop World ’09 Conclusions …all others bring data October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  16. 16. Hadoop World ’09 Conclusions Enterprises quickly evolving their thinking from a Database strategy to a Data Strategy encompassing unstructured & structured content Repeatable business patterns in broad range of industries emerging Hadoop has potential to be the platform for broad range of solutions from web-based analytics -> business event processing -> collaboration October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  17. 17. Hadoop World ’09 Almost The End Selecting customer proof of concept projects INTERESTED? www-01.ibm.com/software/ebusiness/jstart/about.html !"#$%"&!'!()*('+,*,- October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×