• Like

Hw09 Enabling Ad Hoc Analytics At Web Scale

  • 3,278 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,278
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
152
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © 2006 IBM Corporation Enabling ad-hoc Analytic Apps Text with Hadoop rod smith (rod.smith@us.ibm.com) Friday, October 2, 2009
  • 2. Hadoop World ’09 Emerging Technology - What do we work on? Making Hadoop accessible to business professionals October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 3. Hadoop World ’09 New Intelligence - Big Data Nearly 15 petabytes of data are created every day — eight times more than the information in all the libraries in the U.S, Volume of data in enterprises is doubling approximately every 3 years (Forrester Research) • Includes structured and unstructured data, excludes rich media Costs to find, collect & analyze data is decreasing significantly as web innovation proceeds Content is untapped value for business insights & intelligence October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 4. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together Extract (Tom Coates/Yahoo™ ) Gather Explore Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 5. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together (Tom Coates/Yahoo™ ) Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 6. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Leveraging easy-to-use, rich data manipulation metaphors like spreadsheets, etc.. Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 7. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with Rich me in the drivers seat…” Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 8. Hadoop World ’09 Let!s Talk Customer Scenarios - BBC Business Questions • Name names: Who is doing what, who isn!t doing what • Overlay voting record with demographic & voting records over time • Buzz - what are people talking about? BBC Digital • Visualize content relationships Democracy Project Achieving Increased Knowledge of Interest: • Members of Parliament (MPs) Government Transparency • Bills, Debates, Voting Districts Web Content To Gather: • UK Parliament Web Site • Timeframe: 10 + years October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 9. Hadoop World ’09 Let!s Talk Customers Scenarios - Thomson Reuters Business Questions • NewsBuzz: What are the headlines? What are not the headlines but still infocus? • OpinionMonitor: Who is saying what? What are the debate topics? • NewsTimeline: Chronology (pulse) of headline news? Enrich Trader!s Desktop • TopicCloud: Tag based topic metrix Enhancement • IssueAnalytics: Link backs to semantically Timely aggregation & analytics of content related news originating from public internet sites Scenario • Gather unstructured data from anywhere between 200 to Knowledge of Interest: 2000 data sources - every 15 minutes • People, places, events • Perform preprocessing (search, transform, index) over each source • Publish harvested content for distributed content services and downstream Mashups Web Content To Gather: • ~118 3rd Party Finanical News Services and Blogs, including: BBC, CNN ,Yahoo News, Financial Times, NY Times, The Big Picture, Fox News, PR Newswire, Market Watch, World Press, Forbes, Google News, Wall Street , Journal, MSNBC, The Sun, ZDNet, October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 10. Hadoop World ’09 IBM Emerging Technology Project: M2 What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point M2 to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the data What!s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 11. Hadoop World ’09 M2 -> Demo Business Questions • How much is a target company worth? • What are the high-value areas of their portfolio? • Explored cited patent topics, litigated patents Knowledge of Interest: Project: • Patents ranked by citation – e.g how often Improve IP Portfolio Analysis was a patent referenced determines value for Mergers & Acquisitions • Corporate genealogies IP ownership roll-up • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • Gathered 1.4m patent docs from USPTO • 1991-2007 case records from Court of Appeals United States Federal Circuit (CAFC) October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 12. Hadoop World ’09 What!s Under the Covers: Hadoop Emergence of map/reduce programming model for a new class of webApp Hadoop: provides a framework for large scale parallel processing map/reduce apps (Apache projects lead by Yahoo) • Offers simplicity of “programming” - Looks like a simple single threaded app model for developers • Handles big data - scalable storage across machine clusters (think read-only file system) • Deployment: no application knowledge of runtime or OS or cloud necessary • Today - setting up, coding Hadoop jobs in Java, etc. is the domain of skilled Java engineers October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 13. Hadoop World ’09 IBM Emerging Technology Project: M2 Architectural Components Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig M2 Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 14. Hadoop World ’09 Conclusions In God we trust October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 15. Hadoop World ’09 Conclusions …all others bring data October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 16. Hadoop World ’09 Conclusions Enterprises quickly evolving their thinking from a Database strategy to a Data Strategy encompassing unstructured & structured content Repeatable business patterns in broad range of industries emerging Hadoop has potential to be the platform for broad range of solutions from web-based analytics -> business event processing -> collaboration October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 17. Hadoop World ’09 Almost The End Selecting customer proof of concept projects INTERESTED? www-01.ibm.com/software/ebusiness/jstart/about.html !"#$%"&!'!()*('+,*,- October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009