Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources.

In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions.

We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.

  • Be the first to comment

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

  1. 1. Enabling Next Gen Analytics with Azure Data Lake
  2. 2. Microsoft Azure
  3. 3. Microsoft Cloud Global Trusted Hybrid
  4. 4. Big Data Definition Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. – Gartner, Big Data Definition* * Gartner, Big Data (Stamford, CT.: Gartner, 2016), URL: http://www.gartner.com/it-glossary/big-data/
  5. 5. Big Data as a Cornerstone of Cortana Intelligence Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  6. 6. However, there are challenges to Big Data… Obtaining skills and capabilities Determining how to get value Integrating with existing IT investments *Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
  7. 7. Azure HDInsight A Cloud Spark and Hadoop service for the Enterprise Reliable with an industry leading SLA Enterprise-grade security and monitoring Productive platform for developers and scientists Cost effective cloud scale Integration with leading ISV applications Easy for administrators to manage 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  8. 8. • One-click deploy experience for installing apps. • Fully managed PaaS offering. • Access to entire cluster and secure by default. • Install apps on new or existing clusters. • Ease of authoring and deployment. • Certified partners only. HDInsight Application Platform
  9. 9. Hybrid cloud, a reality today 74% Enterprises believe a hybrid cloud will enable business growth1 82% Enterprises have a hybrid cloud strategy, up from 74 percent a year ago2 Workload requirements Regulation Sensitive data Customization Latency Legacy support
  10. 10. Introduction to StreamSets for Microsoft Azure
  11. 11. Who is StreamSets? Enterprise Data DNA StreamSets Mission Top-tier Investors Commercial Customers Across Verticals 150,000 downloads ⅓ of the Fortune 100 Empower enterprises to harness their data in motion. Products StreamSets Dataflow Performance Manager™ (DPM) StreamSets Data Collector™ (open source) Strong Partner Ecosystem Open Source Success
  12. 12. StreamSets Solution Desired Business Outcomes ● Developer & operator efficiency ● On-time delivery ● Data trust & governance Data in motion middleware that ensures data trust.
  13. 13. StreamSets Dataflow Performance Manager (DPM) StreamSets Products StreamSets Data Collector (SDC) Open source tooling and engine to build complex any-to-any dataflows. Cloud Service to map, measure and master dataflow operations. DATAFLOW LIFECYCLE DEVELOP OPERATE EVOLVE (Proactive) REMEDIATE (Reactive) ● Developers ● Scientists ● Architects ● Operators ● Stewards ● Architects
  14. 14. StreamSets Deployment Models Install on Local Machine Install on Azure VM
  15. 15. StreamSets Deployment Models
  16. 16. StreamSets and Microsoft Azure in Use in a Major Bank
  17. 17. The Customer ● Forbes Global 500 financial services company. ● Adopting and moving into cloud at rapid phase. ● Growing rapidly both via acquisitions and organic growth.
  18. 18. Key Challenges Related to Data Movement ● Number of legacy tools both customer and vendor built. ● Security policy changes very hard to manage. ● Lack of security governance due to fragmentation of tools and lack of standardization. ● Difficulty onboarding new data sources as soon as the are created (technology change). ● Data drift (unexpected changes) very hard to manage at scale.
  19. 19. Key Factors for the Customer to Consider Streamsets ● KPIs ● Delivery guarantees ● Multiple types of origins and destinations using a single tool. ● Works natively with Microsoft Azure as part of HDInsight or Azure Virtual Machine or deployed on premise. ● Visualization of actual data transfers. ● Define security boundaries, actors etc. ● Repeating pattern
  20. 20. Customer’s Business Objectives ● Short Compute and Long Storage (ADLS,Azure Blob) in turn fine-grained cost control. ● Ability to build microanalytics framework. For instance, instead of taking entire dataset, build same micro datasets and build microanalytics framework and derive results faster (faster iteration). ● Move away from traditional Data Lake to Azure Data Lake to manage cost and scale.
  21. 21. Use Cases for StreamSets Use Cases 1. Data Movement from On-Premise to Azure Data Lake 2. Consolidating Migration tools into single tool 3. Building DR for HDInsight Kafka workloads.
  22. 22. Resources / Q & A StreamSets Data Collector @ Azure Marketplace https://azure.microsoft.com/en-us/marketplace/partners/streamsets/streamsets-data-collector/ Ingest Data into Microsoft Azure Data Lake (YouTube) https://www.youtube.com/watch?v=c1dVnOK7Luw StreamSets Community https://streamsets.com/community/ StreamSets Dataflow Performance Manager Product Information https://streamsets.com/products/dpm/
  23. 23. Thanks!

×