Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Concur uses Big Data to get you to Tableau Conference On Time

5,357 views

Published on

This is my presentation from Tableau Conference #Data14 as the Cloudera Customer Showcase - How Concur uses Big Data to get you to Tableau Conference On Time. We discuss Hadoop, Hive, Impala, and Spark within the context of Consolidation, Visualization, Insight, and Recommendation.

Published in: Data & Analytics
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ♥♥♥ http://bit.ly/2F90ZZC ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/2F90ZZC ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

How Concur uses Big Data to get you to Tableau Conference On Time

  1. 1. How Concur Uses Big Data to Get You to TC On Time Denny Lee Senior Director, Data Sciences Engineering
  2. 2. About Concur What do we do? • Leading provider of spend management solutions and (Travel, Invoice, TripIt, etc.) services in the world • Global customer base of 20,000 clients and 25 million users • Processing more than $50 Billion in Travel & Expense (T&E) spend each year
  3. 3. About the Speaker Who Am I? • Long time SQL Server BI guy (24TB Yahoo! Cube) • Project Isotope (Hadoop on Windows and Azure) • At Concur, helping with Big Data and Data Sciences
  4. 4. Is Big Data …. The most overused buzzword today? An actual useful framework? Yes!
  5. 5. Consolidate Visualize Insight Recommend TechBar Themes
  6. 6. Consolidate
  7. 7. BTS Invoice Web Analytics Expense Travel Weather
  8. 8. A long time ago… • We started using Hadoop because • It was free • i.e. Didn’t want to pay for a big data warehouse • Could slowly extract from hundreds of relational data sources, consolidate it, and query it • We were not thinking about advanced analytics • We were thinking …. “cheaper reporting” • We have some hardware lying around … let’s cobble it together and now we have reports
  9. 9. But why Hadoop? • Even with primarily relational systems, it involved hundreds of sources • Getting Tableau or any BI tool to connect to so many sources is … not fun • More times than not, we needed to understand a subset or aggregate of this data - not all of the data! • Can use Pig to process, extract, filter the data • Can use Hive - a SQL like query language - to query my data
  10. 10. Invoice Expense Travel
  11. 11. Visualize
  12. 12. demo Querying Hive via Hue and Tableau to understand Air Traffic patterns
  13. 13. Connecting to Hive using Hue - can query using HiveQL, a SQL-like query language
  14. 14. Install Cloudera Hive Driver, Connect to Cloudera Hadoop, fill in above and you’re connected to Hive
  15. 15. Connecting Tableau to Hive may take a very long time in Live mode
  16. 16. Instead, choose Extract which will bring the data across from Hive and you run live queries within Tableau. Note, the extraction will take a long time too!
  17. 17. Now that the data is in Tableau, I can pivot, slice, and filter at the speed of thought!
  18. 18. Can quickly switch to map mode and determine where most itineraries are from in 2013
  19. 19. If you’re expecting to Hadoop or Hive to be fast….
  20. 20. Evolution of Hive • Hive built originally by Facebook placed a SQL-like query language in front of Hadoop Map-Reduce. • Has its flexibility but also its overhead and complexity • Apache community working on Hive Stinger project to advance Hive including DAG scheduler, optimized columnar format, and improved engine semantics
  21. 21. Insight
  22. 22. demo Querying Impala via Hue and Tableau to understand Air Departure Delays
  23. 23. Query airport information using Impala, sort of looks like Hive so far…
  24. 24. But notice the query running in Impala significantly faster!
  25. 25. Not just limit 10 types of queries but ones that involve more complicated where clauses
  26. 26. And quickly chart out the results - e.g. highest airport in Taiwan is Sun Moon Lake
  27. 27. Or even quickly map out the airport locations on a map to see that Sun Moon Lake Airport is in the center of Taiwan
  28. 28. And using Impala is not just for Hue - its even better on Tableau
  29. 29. Now I can connect to my data live and have fast queries returned to Tableau
  30. 30. After quickly modifying the data within Tableau, can discover the amount of flight delays to Seattle, and denote that San Jose has the least # of delays
  31. 31. Why Impala? • Focus is to speed up BI queries • Analogous to relational BI tools except now I can do this against a distributed cluster • Similar to relational BI tools that as its special purpose, can do a lot of optimizations to improve speed • But note this demo was against the same Hive table against data stored in Hadoop
  32. 32. demo Leveraging AtScale to build models on Impala and slicing them in Tableau
  33. 33. Using AtScale to build up a dimensional model based on the data that is stored within Impala / Hive
  34. 34. Slice and filter the Impala model using Tableau For more info, check out: http://atscale.com/
  35. 35. Data Extraction How to query multiple endpoints or multiple data sources? Setup a whole bunch of VMs and have someone connecting to each one and executing get commands?
  36. 36. Optimizing Data Extraction Use Hadoop streaming to execute python script to perform get Hadoop will generate tasks for each API get call and then execute it across all the clusters in the node in parallel
  37. 37. Recommend
  38. 38. TechBar Quick Primer on Apache Spark
  39. 39. What is Apache Spark? Fast and general cluster computing system interoperable with Hadoop Improves efficiency through: »In-memory computing primitives »General computation graphs Improves usability through: »Rich APIs in Scala, Java, Python »Interactive shell Up to 100× faster (2-10× on disk) 2-5× less code
  40. 40. Project History Started in 2009, open sourced 2010 30+ companies now contributing code »Databricks, Yahoo!, Intel, Adobe, Cloudera, Bizo, … One of the largest communities in big data
  41. 41. A General Stack Spark Spark Streaming real-time Shark SQL GraphX graph MLlib machine learning …
  42. 42. demo Applying Spark for Recommendations
  43. 43. Starbucks Store #3313 601 108th Ave NE Bellevue, WA (425) 646-9602 ------------------------------- Chk 713452 05/14/2014 11:04 AM 1961558 Drawer: 1 Reg: 1 ------------------------------- Bacon Art Brkfst 3.45 Warmed T1 Latte 2.70 Triple 1.50 Soy 0.60 Gr Vanilla Mac 4.15 Reload Card 50.00 AMEX $50.00 XXXXXXXXXXXXXXXXXX1004 SBUX Card $13.56 SUBTOTAL $62.40 New Caffe Espresso Frappuccino(R) Blended beverage Our Signature Frappuccino(R) roast coffee and fresh milk, blended with ice. Topped with our new espresso whipped cream and new Italian roast drizzle Expense Categorization One of my receipts that I had OCRed One of the issues we’re trying to solve is to auto-categorize this, so how can we do this? Below is a simplistic solution using WordCount Note, a real solution should involve machine learning algorithms
  44. 44. Spark assembly has been built with Hive, including Datanucleus jars on classpath Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_ version 1.1.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45) Type in expressions to have them evaluated. Type :help for more information. 2014-09-07 22:31:21.064 java[1871:15527] Unable to load realm info from SCDynamicStore 14/09/07 22:31:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as sc. scala> val receipt = sc.textFile("/usr/local/Cellar/workspace/data/receipt/receipt.txt") receipt: org.apache.spark.rdd.RDD[String] = /usr/local/Cellar/workspace/data/receipt/receipt.txt MappedRDD[1] at textFile at <console>:12 scala> receipt.count res0: Long = 30
  45. 45. scala> val words = receipt.flatMap(_.split(" ")) words: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[2] at flatMap at <console>:14 scala> words.count res1: Long = 161 scala> words.distinct.count res2: Long = 72 scala> val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _).map{case(x,y) => (y,x)}.sortByKey(false).map{case(i,j) => (j, i)} wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[12] at map at <console>:16 scala> wordCounts.take(12) res5: Array[(String, Int)] = Array(("",82), (with,2), (Card,2), (new,2), (------------------------------- ,2), (Frappuccino(R),2), (roast,2), (1,2), (and,2), (New,1), (Topped,1), (Starbucks,1))
  46. 46. Still beta, but can connect from Tableau to SparkSQL using Shark driver
  47. 47. Can / will be able to connect to this SparkSQL live
  48. 48. Quick view of Android vs. iOS mobile sessions
  49. 49. SparkSQL - What’s Next? • Currently makes use of Hive code-base • Major focus for 1.2 • Pluggable external datasources • Easier access through pure SQL interface • Access things like JSON tables though SQL?
  50. 50. Consolidate Visualize Insight Recommend
  51. 51. Invite • Pacific Northwest Cloudera User Group • http://bit.ly/1uFD6vJ • Doug Cutting, Hadoop Co-Creator, will be speaking at Disney on 9/24 • Seattle Spark Meetup • http://bit.ly/1q4Z0Ke • Next sessions: • Deep Dive into Spark and Mesos Internals • Unlocking your Hadoop data with Apache Spark and CDH5
  52. 52. Q&A

×