Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Qonnections2015 - Why Qlik is better with Big Data

2,255 views

Published on

Big Data is everywhere. We generate enough data to track every single transactions, sensors, and interactions. But what do we do with this? Using Jethro Data and Qlik you can easily unlock the value of Big Data. Qlik can help you find outliers and trends in Billions rows and make what seems long esoteric process easy. The presentation will go through the problem statement and Qlik like approach to solving the problem. We will also introduce Jethro data a new Qlik Partner to the Our Partners

Published in: Technology

Qonnections2015 - Why Qlik is better with Big Data

  1. 1. Why Qlik® is Better with Big Data John Park, Qlik Eli Singer, JethroData 28 April, 2015
  2. 2. 2#qonnections Eli Singer • CEO & Co-Founder, JethroData • Data management, Information Security, e-commerce • Over 20 Years in leading start-ups • Twitter: @jethrodata Presenters John Park • Senior Solutions Architect, Partner Engineering • ETL, data warehousing, software design, *NIX, architecture • 2 years Qlik, 7 years data warehouse consultant • Twitter: @jpark328
  3. 3. 3#qonnections Legal Disclaimer ©Qlik Confidential This Presentation contains forward-looking statements, including, but not limited to, statements regarding the value and effectiveness of Qlik's products, the introduction of product enhancements or additional products, Qlik’s partner and customer relationships, and Qlik's growth, expansion and market leadership, that involve risks, uncertainties, assumptions and other factors which, if they do not materialize or prove correct, could cause Qlik's results to differ materially from those expressed or implied by such forward-looking statements. All statements, other than statements of historical fact, are statements that could be deemed forward-looking statements, including statements containing the words "predicts," "plan," "expects," "anticipates,“ “see,” "believes," "goal," "target," "estimate," "potential," "may", "will," "might," "could," and similar words. Qlik intends all such forward-looking statements to be covered by the safe harbor provisions for forward-looking statements contained in Section 21E of the Exchange Act and the Private Securities Litigation Reform Act of 1995. Actual results may differ materially from those projected in such statements due to various factors, including but not limited to: risks and uncertainties inherent in our business; our ability to attract new customers and retain existing customers; our ability to effectively sell, service and support our products; our ability to manage our international operations; our ability to compete effectively; our ability to develop and introduce new products and add-ons or enhancements to existing products; our ability to continue to promote and maintain our brand in a cost-effective manner; our ability to manage growth; our ability to attract and retain key personnel; the scope and validity of intellectual property rights applicable to our products; adverse economic conditions in general and adverse economic conditions specifically affecting the markets in which we operate; and other risks and uncertainties more fully described in Qlik's publicly available filings with the Securities and Exchange Commission. Past performance is not necessarily indicative of future results. The forward- looking statements included in this presentation represent Qlik's views as of the date of this presentation. Qlik anticipates that subsequent events and developments will cause its views to change. Qlik undertakes no intention or obligation to update or revise any forward-looking statements, whether as a result of new information, future events or otherwise. These forward-looking statements should not be relied upon as representing Qlik's views as of any date subsequent to the date of this presentation. This Presentation should be read in conjunction with Qlik's periodic reports filed with the SEC (SEC Information), including the disclosures therein of certain factors which may affect Qlik’s future performance. Individual statements appearing in this Presentation are intended to be read in conjunction with and in the context of the complete SEC Information documents in which they appear, rather than as stand- alone statements. This presentation is intended to outline our general product direction and should not be relied on in making a purchase decision, as the development, release, and timing of any features or functionality described for our products remains at our sole discretion. © 2015 QlikTech International AB. All rights reserved. Qlik®, QlikView®, Qlik® Sense, QlikTech®, and the Qlik logos are trademarks of QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.
  4. 4. 4#qonnections Agenda: Why Qlik® is Better with Big Data • Introduction • Notes about Big Data / Hadoop / Data Lakes • Current state of Data Lake implementations • JethroData overview • Demo: Let’s analyse 2.5billion rows from the Data Lake • JethroData architecture • Why Qlik is better with Big Data • Key takeaways
  5. 5. 5#qonnections Introduction Data, technology, use cases are growing exponentially, but it seems we do not know what to do
  6. 6. 6#qonnections Big Data / Hadoop / Data Lakes Big Data is a marketing term
  7. 7. 7#qonnections Big Data / Hadoop / Data Lakes Collection of services and toolkits working with HDFS
  8. 8. 8#qonnections Big Data / Hadoop / Data Lakes Architecture pattern where data is stored/landed for processing by *
  9. 9. 9#qonnections What is happening ? • Enterprise are adopting Hadoop for Data Scientist (Science Project -> Real “Production”) • Hadoop is maturing • Data Lake Architecture is being adopted • Companies and startups are building innovative services on Hadoop • Data is becoming life blood of companies However there are incredible amount of challenges for using Hadoop as single source of truth and sole underpinning of BI Platform: Image source: Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business, Gartner (August 2014)
  10. 10. Data Lake vision vs. reality Scalability, low cost and performance were promised but…
  11. 11. Data Lake vision vs. reality Scalability, Low Cost, and Performance were promised but… It’s been a challenge working with BI Tools Is the Data Lake frozen ?
  12. 12. 12#qonnections Reasons why we are struggling • Data Lakes were not designed to run ad-hoc / interactive queries • Hadoop was designed to run Batch process(ML, ETL, Predictive, Canned Reporting) processes • Designed for data scientist not business analyst • Our trade off: cost vs. scale vs. performance
  13. 13. 13#qonnections Strategies to bridging the gap • Use best of technology to overcome the gap • Solve technical problems one issue at a time • Let’s do Interactive Query on Hadoop!
  14. 14. Eli Singer – More about JethroData
  15. 15. 15#qonnections Who is Jethrodata? • A Qlik Technology Partner • Developing next-generation analytical database • Focused on making interactive BI on Big Data a reality • Combine analytical DB design with full-indexing technology • JethroData 1.0 went GA Apr 7, 2015 • Offices in NY and Israel • Backed by world-class VCs
  16. 16. 16#qonnections Common use cases • Challenge: BI on Hadoop is slow • Current solution: Replicate data into a separate EDW system for fast BI • Challenge: EDW is expensive • Current solution: Migrate batch processes to Hadoop, keep BI on EDW • Pain: Maintaining two separate data systems is expensive and an operational nightmare • With Jethro: Run fast BI directly on data in Hadoop
  17. 17. 17#qonnections Why Is Hadoop So Slow? Architecture: MPP / Full-Scan (All SQL-on-Hadoop) Query: list books by author “Stephen King” Process: each librarian is assigned a rack, they then view each book, check if author is “Stephen King”, if so, get book title Result: too slow, costly, unscalable
  18. 18. 18#qonnections JethroData and Big Data: Index Access Architecture: Index Access (Only JethroData) Query 1: list books by author “Stephen King” Process: go-to Author index, entry of “Stephen King”, get list of books, fetch only these books Result: Fast, minimal resources, scalable
  19. 19. Let’s Analyze 2.5 Billion Rows
  20. 20. • Hadoop Environment - CDH 5.3 / Severs: NN: 1x r3.large; DN: 9x m1.large • Jethrodata Server 1X- r3.8xlarge: 244GB RAM; 320GB SSD (SPOT) • Qlik Sense 1.10 - r3.2xlarge 4 vCPU 30 Gb RAM • Demo 1 TPC-DS • Based on TPC-DS, Replication Factor 1,000 • Sales_demo table based on store_sales fact table with dimension data added • 2.5B rows, 33 columns • 600GB raw data • Demo 2 Airline • Airline Data 123 Million Commercial Flight Data + CSV(Hybrid Architecture) Qlik and Jethro Demo Setup Business Discovery Apps Qlik Server Big Data Platform Big Data Indexing HTTP, HTTPS Protocol Hadoop Client HDFS Everything hosted on Amazon Web Services Direct Discovery ODBC Protocol
  21. 21. Data Node Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state Data Node Data Node Data Node Data Node Query Executor Query Executor Query Executor Query Executor Query Executor Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Performance and resources based on the size of the dataset SQL-on-Hadoop Architecture: MPP/Full-Scan
  22. 22. Data Node Data Node Data Node Data Node Data Node Jethro Query Node Query Node Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state 1. Index Access 2. Read data only for require rows Performance and resources based on the size of the result-set Jethro SQL-on-Hadoop Architecture: Index Access
  23. 23. 23#qonnections SQL on Hadoop: competitive landscape • Hive • Impala • Presto • SparkSQL • Drill • Pivotal/HAWQ • IBM/Big SQL • Actian • Teradata/SQL-H • Microsoft/PDW Full-Scan Based Solutions Read all rows. Every Time. • JethroData Index Based Solution Read ONLY needed rows. Use-Case Comparison: Full-Scan: Optimal for ETL, Predictive Index: Optimal for Interactive BI
  24. 24. JethroLoad er JethroServ er JethroData: system overview Jethro Loader Jethro Server Hadoop Data Source • Hadoop • EDW • Streams • … 1. Initial load – extract data from relevant sources and load through JethroLoader. Incremental data can be loaded at short intervals 1 2 Queries • BI Tools • SQL client • … 3 4 2. Index and column files are stored in HDFS (or S3). Typical size is 35% of raw data 4. JethroServer communicates directly with HDFS to retrieve relevant data. No MapReduce / Spark are used 3. Queries are sent via standard ODBC/JDBC interface. Automatic load- balance across servers
  25. 25. 25#qonnections Jethrodata: architecture highlights Every Column is Indexed! Allow users to slice & dice any way they choose and always got fast response. Scales to any size From 100M to 100B, columns and indexes are compressed and partitioned. Super-easy to implement Compatible with every Hadoop distribution. Installs on separate server(s) from Hadoop cluster.
  26. 26. 26#qonnections Jethro Indexes: innovative technology • Fast to read  Simple: Inverted-list indexes map each column value to a list of rows  Fast: Direct access O(1) to each value entry  Scale: Distributed, highly hierarchical compressed bitmaps • Fast to write – Index files are appended, duplicate entries allowed – Incremental – new data indexed as it comes in – No locks, no random read/write http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending:
  27. 27. Intelligent caching yechnology • Reuse of intermediate/final query results – Repeat queries in sub-seconds • Addresses wide top-of-the-funnel queries – Analysis starts with queries with no/few filters – Those queries are often repeated in dashboard scenarios • Transparently adapts to incremental loads – Execution on delta data + merge saved results Query Speed Query Selectivity Fast Slow Few More Query Repeat Query Selectivity Hi Low Few More Query speed Query Selectivity Fast Slow Few More
  28. 28. Advantage of Qlik and Jethro Data • Now, we can get scale, cost, and performance • Faster queries with more selection criteria • Faster Direct Discovery load time due to dimensional unique values already available in JethroData indexes • Use of system-wide optimization and smart caching • Creates an interactive Hadoop for Business Discovery • Allows Qlik Customers have same experience across all data sources including Hadoop. Qlik APPLICATION DIRECT DISCOVERY QUERIES
  29. 29. 29#qonnections Why Qlik Is Better for Big Data • Allows users to analyze your big data the way Business users want to. • Scalability, Extensibility and Beautiful Visualizations to Tell your Story.
  30. 30. 30#qonnections Key takeaways • Qlik can do Big Data well with right complementary technology • Qlik associative interface can open up new use cases with Big Data • Qlik and Jethro can provide true interactive BI with Hadoop
  31. 31. 31#qonnections Follow-up resources • Open to public Qlik Sense hub: http://jethrodata.qlik.com • Free evaluation version of Jethro available at http://jethrodata.com/download
  32. 32. Thank You

×