Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field

342 views

Published on

Listen to this session to get some insights from two recent implementations of cloud-based Big Data clusters. The purpose of the first solution is DWH Offloading and Machine Learning in the telecommunications industry. The session will cover how we established the data transfer between on-premises server and cloud services. In addition, we will talk about Spark jobs on EMR-cluster, Hive with GlueCatalog to query data stored in S3, quick analytics with Athena, hosting and testing Exasol on EC2-Instances and the provisioning of the cloud infrastructure with CloudFormation. Looking at an earlier phase in the AWS adoption lifecycle, we will also talk about an insurance company finding their way into the AWS cloud. Their goal is to complement their existing enterprise DWH with more agile and data science oriented tools from the cloud, aiming at machine learning and artifical intelligence to complement their claims workflow. In this part we will cover topics like security setup in IAM, connectivity configuration in EC2 and EMR, all complemented with S3 for their storage needs.

Speakers: Roland Wammers, Matthias Diekstall, Manuel Marowski, Opitz Consulting Deutschland GmbH

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field

  1. 1. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich  Überraschend mehr Möglichkeiten © OPITZ CONSULTING 2018 Big Data Stories from the Field Matthias Diekstall, Roland Wammers, Manuel Marowski From Theory to Practice
  2. 2. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich Seite 2 Agenda 1 2 3 DWH Modernization with AWS BigData Advanced Analytics & Complex Event Processing at congstar Stream Analytics & Machine Learning with AWS OC Quickstarter Big Data Stories from the Field
  3. 3. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich Seite 3 DWH Modernization with AWS BigData as an Insurance Company  Once upon a Time …  Defined Targets  Challenges  Our Proposal  Technical Implementation  … and they lived happily ever after 1 Big Data Stories from the Field
  4. 4. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 4 Once upon a Time …  Mid-sized insurance company  6000 Employees  4 M Clients  14 M Contracts  3.2 B EUR in Revenues  Enterprise DWH established  Standard Reporting in place  Data Mining in a few departments  Using MS Excel mostly  Partially R desktop usage
  5. 5. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 5 Defined Targets  Get a feeling for new technologies (Hadoop Ecosystem)  Learn their approach to data processing  Low investment  „Big Data Test Drive“  Increase flexibility for data sources  Enable self service for departments on a larger scale
  6. 6. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 6 Challenges  No tangible use case initially  No decision regarding products/license model  No good grasp on fundamental concepts of Big Data technologies  Little resources for driving this project  No hardware available (short-term)  Direct connectivity to source systems questionable
  7. 7. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 7 Our Proposal  Quick start with a cloud-based solution  Start small and allow for growth  Allow a wide variety of technologies without having to dedicate resources to administration and operation  To be more precise:  Prepare environment for easy startup  Train/coach employees in essential aspects  Use AWS technologies
  8. 8. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 8 Technical Implementation  AWS IAM for user management  AWS S3 for data storage  AWS EMR as the basis for data processing  Hive  Pig  Spark  Python  Zeppelin as graphical frontend  Augmented with R Studio  Mini Tutorials for users
  9. 9. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 9
  10. 10. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 10 AWS Mini tutorials for users
  11. 11. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 11 … and they lived happily ever after  Results  Targets achieved at minimal cost (< $500 in ~ 3 months)  Competency development  Better understanding of „how it works“  Lessons learned  Focus on as few tools as possible  Create simple step-by-step tutorials  Even a hypothetical use case is better than none
  12. 12. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich Seite 12 Advanced Analytics & Complex Event Processing at congstar  First Thoughts  Creating the Base  Working with the Data  First Steps to Advanced Analytics 2 Big Data Stories from the Field
  13. 13. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 13 congstar GmbH  Subsidiary of Telekom Deutschland GmbH  Founded in July 2007  Sells mobile contracts and DSL  Over 4.500.000 customers
  14. 14. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 14 Motivation  Better understanding of the user  Improve the user experience  Enhance existing systems  Being prepared for future requirements  Create new content in reasonable time
  15. 15. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 15 Challenges  Building a big data system for advanced analytics and complex event processing in AWS  Find right technologies in Hadoop  Find suitable AWS services  Keeping the costs low  Provisioning  Replacing old systems with new technology  Secure data transfer between on prem and AWS  Live agile
  16. 16. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 16 Infrastructure as code  Testing resources and services via AWS management console  Creating CloudFormation templates  Infrastructure as code  Create stacks for development, test and production system  Working with stacks  Adjustments made in the code  Diff of old and new code  Rollback function in case of error  Establishing a secure VPN connection
  17. 17. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 17 Overview of the basic Infrastructure
  18. 18. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 18 Collecting and loading data into S3  Data transfer  Initial connection only established from the on prem network  Need on prem solution to transfer data into S3  NIFI  Web UI  Schedule flows  No programming skills needed  Limited to used processors  Format: CSV, AVRO
  19. 19. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 19 Process data  Using Spark (Scala)  Fast data processing  Needs implementation  Format: Parquet or Avro – saves space, time and money  Organize the data  Layer  Partitions  Purpose  Source  …
  20. 20. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 20 Using spot instances  Data-backup capabilities  Set a max. bidding price you are willing to pay  Saves time and money  Cons:  You loose the instances when the spot-price increases you max. price  2 minutes to save your data  Hybrid model for Hadoop  Master and 1/3 workers on on-demand instances  Rest on spot instances
  21. 21. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 21 Get data available with SQL  Create Glue catalog with a Glue crawler  Scans all sub folders of a S3 path  Tries to recognize the right format  Classifies according to the file type  Glue catalog  Used as Hive metastore on an EMR cluster  Used in Athena for ad hoc analytics  Not all classifiers are perfect  Manual adjustments of the crawler are required  Manual adjustments of the table definitions are required
  22. 22. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 22 Testing Exasol on AWS market place  Starting Exasol on EC2 instance  Using an EBS instance  Testing various instances  Duplicating the instance to be more free in testing  Testing different server types/sizes  Testing licensed software (AWS Marketplace) before buying expensive license
  23. 23. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 23 Amazon SageMaker  JupyterHub  Python-based API  Focusing on development, learning, testing and distributing ML-Models  Easy switching between several algorithms
  24. 24. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Seite 24 Outlook  Combine Exasol with ML models created by SageMaker
  25. 25. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Stream Analytics & Machine Learning with AWS OC Quickstarter
  26. 26. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich Seite 26 Stream Analytics & Machine Learning with AWS OC Quickstarter  Use case  DWH offloading  Architectural overview  The data flow  Industrial use case 3 Big Data Stories from the Field
  27. 27. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Use case: Twitter Stream Analytics Seite 27 Twitter Streaming Data Machine Learning sentiment analysis
  28. 28. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field DWH Offloading DWH Integration Layer Enterprise Layer User View Layer Source
  29. 29. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field DWH offloading Data Integration Layer Enterprise Layer Offload Refined Data Lake User View Layer ETL
  30. 30. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Advantages of DWH-Offloading  Cost savings through outsourcing to low-cost storage space  Combining structured data with unstructured data
  31. 31. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Used technologies  Scala  Hive, Oozie, Kafka, Spark, Sqoop ➢ Stream Processing ➢ DWH Offloading ➢ Scheduling  Spark.ML ➢ sentiment analysis  AWS ➢ infrastructure / Hadoop / HDFS / S3 / Data lake  ELK-Stack (Elastic Search, Logstash, Kibana) ➢ Visualization / Indexed data access
  32. 32. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field
  33. 33. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field
  34. 34. © OPITZ CONSULTING 2018 Informationsklassifikation: ÖffentlichBig Data Stories from the Field Industrial use cases  Predictive Maintenance  Real-time error detection in production processes  Dynamic evaluation of component quality
  35. 35. © OPITZ CONSULTING 2018 Informationsklassifikation: Öffentlich  Überraschend mehr Möglichkeiten @OC_WIRE OPITZCONSULTING opitzconsultingWWW.OPITZ-CONSULTING.COM Seite 35 Contact us! Big Data Stories from the Field Matthias Diekstall Developer +49 201 892994-1753 Matthias.Diekstall@opitz-consulting.com Roland Wammers Solution Architect +49 201 892994-1757 Roland.Wammers@opitz-consulting.com Manuel Marowski Developer +49 201 892994-1748 Manuel.Marowski@opitz-consulting.com

×