Web Briefing: Unlock the power of Hadoop to enable interactive analytics


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web Briefing: Unlock the power of Hadoop to enable interactive analytics

  1. 1. Unlock the power of Hadoop to enable interactive analytics & real-time Business Intelligence July 10, 2013
  2. 2. Web Briefing: Unlock the power of Hadoop to enable interactive analytics • Thank you for joining today’s session! • The web briefing will start momentarily. • We will use the WebEx Q & A feature Today’s Slides are available at www.slideshare.net/kognitio @Hortonworks @Kognitio Follow the conversation on Twitter: Teleconference: Use your computer, or call: US +1 631 267 4890 UK +44-203-478-5289 Passcode: 841 203 797
  3. 3. Unlock the power of Hadoop to enable interactive analytics July 10, 2013 Demonstration: SQL and Hadoop with in‐memory  MPP Acceleration  ‐ Stuart Watt Hadoop meets Mature BI: Interactive Analytics ‐ Michael Hiskey Modern Data Architectures ‐ John Kriesa Web Briefing Agenda
  4. 4. © Hortonworks Inc. 2013 Modern Data Architectures Big data drivers and patterns John Kreisa – VP Strategic Marketing, Hortonworks @marked_man
  5. 5. © Hortonworks Inc. 2013 Existing Data ArchitectureAPPLICATIONSDATA SYSTEMS TRADITIONAL REPOS RDBMS EDW MP P DATA SOURCES OLTP,  POS  SYSTEMS OPERATIONAL TOOLS MANAGE &  MONITOR Traditional Sources  (RDBMS, OLTP, OLAP) DEV & DATA TOOLS BUILD &  TEST Business  Analytics Custom  Applications Enterprise  Applications Page 5
  6. 6. © Hortonworks Inc. 2013 6 Common Types of Hadoop Data 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in text across millions of web pages, emails, and documents Value Page 6
  7. 7. © Hortonworks Inc. 2013 Next-Generation Data Architecture Page 7 APPLICATIONSDATA SYSTEMS Microsoft Applications DATA SOURCES Traditional Sources  (RDBMS, OLTP, OLAP) In‐memory MPP Accelerator BI Tools & OLAP Clients TRADITIONAL REPOS RDBMS EDW MPP OPERATIONAL TOOLS MANAGE &  MONITOR DEV & DATA TOOLS BUILD &  TEST New Sources  (web logs, email, sensors, social media) HORTONWORKS  DATA PLATFORM
  8. 8. © Hortonworks Inc. 2013 Interoperating With Your Data Tools Page 8 APPLICATIONSDATA SYSTEMS Microsoft Applications DATA SOURCES Traditional Sources  (RDBMS, OLTP, OLAP) In‐memory MPP Accelerator HORTONWORKS  DATA PLATFORM OPERATIONAL TOOLS Viewpoint DEV & DATA TOOLS TRADITIONAL REPOS New Sources  (web logs, email, sensors, social media)
  9. 9. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 9
  10. 10. © Hortonworks Inc. 2013 Data SystemsApplicationsSources Infrastructure ‐ Data Lake Modern Data Architecture Hadoop as a Shared Data Lake TRADITIONAL REPOS RDBMS EDW MPP Custom  Analytic App New Sources  (logs, clicks, social media, sensors) Packaged  Analytic App Traditional Sources  (RDBMS, OLTP, OLAP) • A more mature organization will have this as a goal for Hadoop ENTERPRISE  HADOOP  PLATFORM Page 10 • Store all data and build/enable applications on shared “data lake” • Delivers broad value across the enterprise In‐memory MPP Accelerator HORTONWORKS  DATA PLATFORM • Seamless SQL access with interactive analytics
  11. 11. © Hortonworks Inc. 2013 Data SystemsApplicationsSources Hadoop for New Targeted Applications TRADITIONAL REPOS RDBMS EDW MPP New Sources  (logs, clicks, social media, sensors) Packaged  Analytic App Traditional Sources  (RDBMS, OLTP, OLAP) ENTERPRISE  HADOOP  PLATFORM Business Application Catalyst: Type of Data Custom  Analytic App In‐memory MPP Accelerator HORTONWORKS  DATA PLATFORM • Many organizations start here & expand usage • Driven by a type of data that was not capable of analysis before Hadoop • Delivers explicit value for a business case or an individual LOB • Complementary to existing applications that use SQL • Interactive analytics with MPP in-memory execution of R, Python, Perl, etc.
  12. 12. © Hortonworks Inc. 2013 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 12 PLATFORM  SERVICES HADOOP  CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS  DATA PLATFORM (HDP) OPERATIONAL  SERVICES DATA SERVICES HIVE &  HCATALOG PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD &  EXTRACT WebHDFS
  13. 13. Hadoop meets Mature BI: Interactive Analytics Michael Hiskey VP of Marketing & Business Development @mphnyc
  14. 14. Mature Business Intelligence and Reporting Numbers, tables, charts, indicators …accessed with ease and simplicity Historical information, latency BI tools have plateaued Decision Support Advanced analytics and data science More math…a lot more math
  15. 15. Drive for a deeper level of understanding Dynamic Simulation Statistical Analysis Behavior modellingReporting Fraud detection create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DA partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header colnames(prod1)<-c("DOW","ID","PRODNO","DA dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(D daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[pr colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(bases forecast<-array(0,c(dim1[1]+28,4)) colnames(forecast)<-c("ID","ACTUAL","PREDI select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Tran cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) R rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Tot from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary group by Trans_Year, Num_Trans order by Trans_Year desc, Num_Trans; select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ a group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and regi select total_sales from summary where year = 2006 and month = 5 and regi
  16. 16. The Analytical Enterprise Business Analyst Systems Admin Data Scientist Sexiest job of the 21st Century? Key: “Graduation” • Projects will need to easily Graduate from the Data Science Lab and become part of Business as Usual
  17. 17. Your goal: PRESS HERE …and really cool Big Data stuff happens!
  18. 18. Big Data: Bring the Analytics TO the Data Kognitio Hadoop Integration • Kognitio Map/Reduce Agent uploads itself to Hadoop nodes • Query passes selections, relevant predicates • Data filtering & projection locally on each node • Data filtered as it is read from file(s) • Only data of interest is transferred and loaded into memory via parallel load streams
  19. 19. Demonstration:  SQL &Hadoop with in‐memory  MPP Acceleration Stuart Watt Senior Systems Engineer @Kognitio
  20. 20. © Hortonworks Inc. 2013 Hortonworks Snapshot • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo, Tenaya, Dragoneer
  21. 21. Kognitio Snapshot: Mature SQL atop Hadoop Kognitio is an in‐memory  analytical platform that is tightly  integrated with Hadoop for high‐ performance advanced analytics  that make Big Data more  consumable for enterprises,  especially those with mature BI  environments or engrained  tools.  • Privately held • Invented the in‐memory analytical platform • Labs in the UK ‐ HQ in New York, NY  • Powering advanced analytics at  organizations worldwide, such as: 
  22. 22. © Hortonworks Inc. 2013 Interactive analytics with Hadoop: Getting Started • Assess your environment and use case for Hortonworks Data Platform + Kognitio Analytical Platform www.kognitio.com/hadoop Download Hortonworks Sandbox www.hortonworks.com/sandbox Sign up for Training for in-depth learning hortonworks.com/hadoop-training/ ZERO to big data in 15 minutes: Request a Meeting Download the Kognitio Analytical Platform • No registration required • Perpetual license - No time limits www.kognitio.com/free
  23. 23. Question & Answer session will be conducted electronically, using the panel to the right of your screen Today’s Slides available at: www.slideshare.net/kognitio Download Hortonworks Sandbox www.hortonworks.com/sandbox Download the Kognitio Analytical Platform • No registration required • Perpetual license - No time limits www.kognitio.com/free Unlock the power of Hadoop to enable interactive analytics Request a Meeting www.kognitio.com/hadoop
  24. 24. connect www.kognitio.com twitter.com/kognitiolinkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/kognitio +1 855  KOGNITIO
  25. 25. © Hortonworks Inc. 2013 Hortonworks Sandbox Fastest onramp to Apache Hadoop • What is it? – Free, virtualized single-node version of Hortonworks Data Platform – A personal Hadoop environment – An integrated learning environment with hands-on step-by-step tutorials • What it does? – Dramatically accelerates the process of learning Apache Hadoop – Accelerates & validates the use of Hadoop within your unique data architecture – Use your data to explore and investigate your use cases • ZERO to big data in 15 minutes • Get Started! Page 25 Download Hortonworks Sandbox www.hortonworks.com/sandbox Sign up for Training for in-depth learning hortonworks.com/hadoop-training/