Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Tundra: Creating a Flexible Cloud Based Data Ecosystem

222 views

Published on

Cloud computing has changed how organizations use, access and store their data. While these paradigms have shifted, the traditional way of thinking about databases and data warehouses remain steadfast in “on-prem” thinking, even in many cloud deployments. Can cloud-native data platforms such as Snowflake coupled with Big Data thinking enable better performance, lower total cost of ownership, and higher data flexibility? This session will walk you through a real-world customer story to provide the answer.

Published in: Data & Analytics
  • Be the first to comment

Big Data Tundra: Creating a Flexible Cloud Based Data Ecosystem

  1. 1. Big Data Tundra Building Flexible Cloud Data Ecosystems Phil GoerdtMike Fuller
  2. 2. © 2017 Red Pill Analytics Agenda - Introductions - Building a Flexible Data Ecosystem - Snowflake Data Warehouse - StreamSets Data Collector - Looker - Questions
  3. 3. © 2017 Red Pill Analytics RED PILL ANALYTICS: WHAT WE DO CAPACITY ANALYTICS CHECKMATE SINGLE DOSES BI Development as a Service Agile Development Cloud Enabled Continuous Integration Support Check-in & Automate True Multi-User Development Full Source Control Continuous Integration Hosted or On-Premise Strategy & Roadmap Architecture Prototyping Infrastructure Training
  4. 4. © 2017 Red Pill Analytics Mike Fuller - Consultant at Red Pill Analytics - Primarily a Release Lead & Developer - Full-Stack BI - Have been working with data for 9 years - Focused on Business Intelligence for 5 years - Worn several IT and Business hats - Business Analyst, Reporting Analyst, Data Analyst, BI Engineer, BI Developer, Data Architect, Warehouse Architect, Release Lead, Consultant, etc … Just don’t call me late for dinner
  5. 5. © 2017 Red Pill Analytics Phil Goerdt - “Full stack” BI and DI guy passionate about using and making data better - Have worked with data for last 8+ years - Last 5+ have been BI focused - Consultant at Red Pill Analytics - Hipster Alert: Have played the part of data scientist… before it was a buzzword! - Find me on LinkedIn, Twitter, Medium and the Red Pill Blog
  6. 6. © 2017 Red Pill Analytics What is a Flexible Data Ecosystem?
  7. 7. © 2017 Red Pill Analytics Client Engagement - Requirements - Consolidate disparate data to promote conformity and uniformity - One central location for all data to power data-driven decisions for the enterprise - Create a data lake to collect and structure business process data - Use data from the data lake downstream to power reporting and analytics - “How can we prevent our data lake from turning into a data swamp?” - Cloud-based - Limit up-front costs - Follow KISS modus operandi - do not over engineer
  8. 8. © 2017 Red Pill Analytics A Flexible Data Ecosystem Will... - Allow developers to work using agile methodology - Allow data to persist in multiple states - Allow data to be consumed in multiple use cases - Be adaptable to future change - Allocate resources (compute/storage) on demand - Accommodate varying data types - Do all of the above while maintaining sensible governance
  9. 9. © 2017 Red Pill Analytics How Do We Turn This Into Something Real? Logging Data Application Data Data Lake Reports & Analytics
  10. 10. © 2017 Red Pill Analytics A Cloud Ecosystem in the Wild Logging Data Application DataApplication DataApplication Data
  11. 11. © 2017 Red Pill Analytics Why Choose Snowflake?
  12. 12. © 2017 Red Pill Analytics Snowflakes are Made in Clouds - Snowflake is a truly “cloud engineered” data warehouse solution - Hardware and virtualware constraints have been lifted by storage and compute being separated - This allows for truly scalability on both sides of the cluster
  13. 13. © 2017 Red Pill Analytics Snowflake: The Database (Storage) - AWS S3 holds all of the data inserted into Snowflake - Virtually no limit for data! - This allows for querying of non-traditional data types such as JSON or Avro - Queries written in ANSI SQL allow developers to use a language they already know
  14. 14. © 2017 Red Pill Analytics Snowflake: The Warehouse (Compute) - Compute is run on AWS EC2 instance clusters - This allows for scaling horizontally (cluster count) and vertically (node count) compute resources cheaply and easily! - This also allows different sized clusters for different needs - Example scenarios: - Data scientists may need large compute for a short period of time - Finance users smaller compute all the time
  15. 15. © 2017 Red Pill Analytics Maintenance?! Patching?! Deployment Lifecycle Management?!
  16. 16. © 2017 Red Pill Analytics Getting Answers, Not Fighting Fires - Snowflake managing the platform = less headaches (for you and DBAs) - 11 9’s Durable & 4 9’s Available - Compliant for security requirements such as: PII/PHI/HIPAA - OpEx model allows for small initial investment compared to CapEx seen with traditional data warehousing solutions - “Oops” features like Time Travel and Undrop allow for recovery in development
  17. 17. © 2017 Red Pill Analytics Performance Anxiety? Not Here! - One Snowflake instance can contain multiple databases - Users can query across databases which is beneficial for “unsiloing” data - Self-contained querying across databases facilitates data lake/reservoir concept with better organization (no data swamp!) - Zero Copy Clone: create additional environments at no additional costs - Performs well on large data sets - Data Warehouse = Optimized for analytic queries
  18. 18. © 2017 Red Pill Analytics Streaming: Getting the Data to the Cloud
  19. 19. © 2017 Red Pill Analytics StreamSets Data Collector Integration - Low-latency ingest infrastructure tool - Create continuous data ingest pipelines using a drag and drop UI - Open Source and runs on Linux or Mac OS - Native integrations with AWS, HDFS, SFDC, Mongo, etc. - JDBC Connectivity - Simple to stream into S3 and use bucketed object stores in lieu of typical DFS solutions - Natural candidate for deployment via EC2 to leverage AWS platform (security,CLI, etc.)
  20. 20. © 2017 Red Pill Analytics Data Through the Looking Glass
  21. 21. © 2017 Red Pill Analytics Looking at the Data… with Looker - Data analytics platform - Uses a logical/semantic modeling layer allowing for enterprise friendly delivery - Integrates with GIT for version control - Available in the Cloud (SaaS) or on-premises - Allows creation of reports and dashboards for content distribution among users - Natively connects directly to the Snowflake database and supports wide table querying
  22. 22. © 2017 Red Pill Analytics Would We Build It Again?
  23. 23. © 2017 Red Pill Analytics Yeah, We Definitely Would - StreamSets Data Collector is a simple-to-use, high performing data streaming tool and it is free - Snowflake’s S3 storage allowed us to build the warehouse as and when we saw fit - Allows for a truly agile approach to development of a data warehouse - No need to worry about sizing requirements due to Snowflake’s flexibility - Could build the facts, dimensions and views in Looker as requirements were received and understood
  24. 24. © 2017 Red Pill Analytics Review: A Cloud Ecosystem in the Wild Logging Data Application DataApplication DataApplication Data
  25. 25. © 2017 Red Pill Analytics Questions?
  26. 26. © 2017 Red Pill Analytics
  27. 27. © 2017 Red Pill Analytics WHAT WE DO CAPACITY ANALYTICS CHECKMATE SINGLE DOSES BI Development as a Service Agile Development Cloud Enabled Continuous Integration Support Check-in & Automate True Multi-User Development Full Source Control Continuous Integration Hosted or On-Premise Strategy & Roadmap Architecture Prototyping Infrastructure Training
  28. 28. © 2017 Red Pill Analytics Capacity-driven: Add capacity to your team by choosing small, medium, or large, with flexibility to increase or decrease as needed. Flexible: You receive an allowance of points each sprint to spend however you choose. Agile: We have a complete methodology for how to deliver your initiatives rapidly. Fluid: Hire a team, not a person. Tasks are assigned to the right person with the right skill. Our approach to rapidly deliver analytics to you. Capacity Analytics
  29. 29. © 2017 Red Pill Analytics twitter.com/RedPillA linkedin.com/company/red-pill-analytics youtube.com/redpillanalytics youtube.com/realtimebi facebook.com/redpillanalytics redpillanalytics.com bit.ly/datavizdaily-playlist Follow us:

×