Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data platforms 2017


Published on

My session on virtualizing big data in the cloud at Data Platforms 2017

Published in: Technology
  • Be the first to comment

Data platforms 2017

  1. 1. Virtualizing Big Data in the Cloud and everything in between Technical Intelligence Manager of CTO, Delphix Kellyn Pot’Vin-Gorman
  2. 2. 2© 2017 Delphix Corporation Kellyn Pot’Vin-Gorman Technical Intelligence Manager for the Office of CTO, Delphix • 2 decades experience as multi-platform DBA, (Oracle, MSSQL, MySQL, Sybase, Postgres…..) • Oracle ACE Director, (Alumni) • Oak Table Network • APEX Women in Technology Award, CTA 2014 • STEM education with Raspberry Pi and Python • Liaison for Denver SQL Server User Group • Rocky Mountain Oracle Conference Director and Board Director • Author, blogger, (
  3. 3. 3© 2017 Delphix Corporation Cloud and Big Data • Big data often is dependent upon relational and other legacy data stores. • Multiple data sources, complex, often home-grown environments when left without automation, leads to human error. • Difficult if not already in the cloud, often built in the cloud by ease of access to cloud resources. • Open source leads to open to discussion.
  4. 4. 4© 2016 Delphix Corporation Cloud Trends  85% of Enterprises have a multi-cloud strategy  77% are hybrid cloud, (different than the 2017 cloud survey)  Workloads are being run in the cloud- - 41% in public clouds - 38% in private clouds  Enterprise Companies are choosing cloud - 65% want public cloud - 63% want private cloud solutions - 93% will be hybrid State of the Cloud Survey, RightScale
  5. 5. 5© 2017 Delphix Corporation This trend will only increase in the next five years as cloud continues to overtake the industry
  6. 6. 6© 2017 Delphix Corporation Big Data Project Types through 2026 0 10 20 30 40 50 60 70 80 90 100 2017 2018 2019 2020 2021 2022 2023 2024 2025 Big Data Pro Big Data HW Big Data SW Billions of Dollars in Big Data Projects
  7. 7. Cloud Adoption and How It’s Changing 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Public Cloud Private Cloud Hybrid Cloud Any Cloud 2016 2017 2018
  8. 8. 8© 2017 Delphix Corporation How does virtualization fit into this? • Big data is…well, BIG and built out of necessity. • Companies report spending more time gathering & distributing big data than analyzing it. • Only 59% of big data is easily accessible and 37% report it takes more than a day to access new data. • Another 59% report that legacy data storage systems are still hindering big data initiatives.
  9. 9. 9© 2017 Delphix Corporation Virtualization Adoption Trend . Source: Gartner Forecasts 0% 20% 40% 60% 80% 100% 2013 2014 2015 2016 2017 2018 Percentage of Virtualized Workloads Percentage of Virtualized Worklaods
  10. 10. Our Use Case  Using online project of flat files for big data by Chris Wilson from Time Magazine, based off publicly available datasets.  Uses flat files and Ajax to produce workable datasets from open payments data from of highly anticipated datasets
  11. 11. 11© 2017 Delphix Corporation History of the Data
  12. 12. Value of the Data $0 $100 $200 $300 $400 $500 $600 10 20 30 40 50 60 70 80 90 Average Payment and Percentile for Physicians from Drug Companies Per Medication Payments
  13. 13. Delivery Method- A Real Decision  Data is large, no matter if a big data solution or otherwise, (VLDB).  90% of data between environments is often consistent, with data appends.  How often is a solution choice based off skill set of those in place and how will this support the future with growth?  Do you want to pay for licensing of database and client servers as stated in our use case example?  No need to patch, upgrade, etc. Just lock down file permissions and maintain was the goal and this resonated with many customer scenarios.
  14. 14. Should vs. Did  Although an RDBMS with JSON would have been the preferred method to deliver the data, the author’s team made a different choice…
  15. 15. This… “Presented a technical challenge, because our small [team] is a more comfortable with client-side web development than we are with administering servers and databases. So we decided to make the whole thing searchable using only flat files and Ajax requests.” I’ve heard similar stories before.
  16. 16. My Virtualization Demo Environment  Each zip file was under 1GB, (NOT big data), 16G uncompressed.  Unstructured, it was cumbersome to work with.  Gave excellent example of network bottlenecks transferring to Source.
  17. 17. 17© 2017 Delphix Corporation “The juiciest file, which contained information on payments that medical companies made to physicians for things like meals, travel, and consulting fees, was 2.6 million lines in a single 1.44 GB file.” Per the Author-
  18. 18. 18© 2017 Delphix Corporation The Solution- • Files hosted on Amazon S3 • CloudFront layer for high availability • 350K flat files to be migrated to Amazon • Over 30 seconds per file to upload. Now do this for development and test, then rinse and repeat, finally producing and releasing till complete.
  19. 19. To Paraphrase Hippocrates… Databases are short, files are long.
  20. 20. 20© 2017 Delphix Corporation How Can This Scenario Be Enhanced? • Remove the bottleneck and duplication of flat files with virtualization. • Ease ability to access and migrate to a cloud platform from on-prem. • If a real environment, and commonly, legacy data sources, applications and access points, containerize and simply delivery.
  21. 21. Virtualize Options for Big Data  Partitioning- As many big data is partitioned resources across a single physical system, virtualizing is often easy with modern virtualization products.  Isolation- Many big data environments may already be on VMs, to create a virtualized dataset could eliminate extensive storage requirements of duplicate data.  Package- Collect all tiers and dependencies for a big data solution and containerize, making development, testing and delivery simple and automated.
  22. 22. 22© 2017 Delphix Corporation Introduction to Virtualization for Flat Files Flat files can be virtualized individually or part of a “container” and can eliminate duplication of immense files that are part of big data environments.
  23. 23. What is a Delphix vFile?  Feature for “Unstructured Files”  A directory tree of files for Delphix to manage.  Can be: - Link from an existing dataset on a source server into a dSource - Files will be projected using NFS to a target server.  Small part of bigger “swiss knife”, as able to virtualize relational databases, applications, etc.
  24. 24. 24© 2017 Delphix Corporation Prerequisites for Cloud Environments  Ensure NFS Mounts are option in cloud environment  No clustered environments, (MSFC, Oracle RAC, etc.)  Ensure credentials are set up correctly to perform per documentation.
  25. 25. Prepare Cloud Host, (if not pre-installed)  For our Demo- Amazon- Install the NFS Client  Linux or Redhat sudo yum –y install nfs-utils  Ubuntu sudo apt-get –y install nfs-common
  26. 26. Create A New Source for DataSet
  27. 27. Allow For Initial Sync Notice the Length of Time for 16G to Delphix Engine
  28. 28. 28© 2017 Delphix Corporation Limitations of vFiles Datasets  No interval refresh capability from parent, (must be recreated.)  Pulled directly from dataset’s parent, so take care with network, I/O performance.  Existing can still rewind, take snapshots, etc. Just checking if you’re paying attention
  29. 29. 29© 2017 Delphix Corporation Now Copy Source Files Into Mounted vFile Source Take a snapshot to mark updates to Files:
  30. 30. 30© 2017 Delphix Corporation Provision to our Target Host • NFS Mount is ready on LinuxTarget host. • Choose to provision to the LinuxTarget • Click on Next and keep defaults, but could add scripts to make scripting changes, etc. as part of “hooks”
  31. 31. 31© 2017 Delphix Corporation Provisioning a vFile  My Datasets  Select source to provision from  Select a snapshot  Click Provision  Provision vFiles and update the default to the correct mount point.  Select a target environment and add filter.
  32. 32. 32© 2017 Delphix Corporation Verification- Clone Is Successful! Disable Source Mount Enabled Once More!
  33. 33. 33© 2017 Delphix Corporation vFiles Manageability  Can Enable or Disable from CLI or Interface.  Configure from same.  Easily populate to all targets from one source.  Automate via DevOps as part of scripting or Jenkins/Chef jobs
  34. 34. 34© 2017 Delphix Corporation Rewinding a vFile or “File Version Control”  Highlight the vFiles  On Timeflow, select the snapshot  Click on Rewind No need to SCP, FTP or recreate!
  35. 35. 35© 2017 Delphix Corporation Rewind vFile
  36. 36. 36© 2017 Delphix Corporation Files Returned To Previous Version/Status
  37. 37. 37© 2017 Delphix Corporation Refresh vFile • Refresh from Source • Update target with new files • Return to original files if catastrophic situation.
  38. 38. 38© 2017 Delphix Corporation An overall trend in companies that includes autonomic computing, where the IT environment will be able to manage itself based on perceived activity, and resource shared computing, in which computer processing power is utilized and/or paid for only as needed. The goal of virtualization is to centralize administrative tasks while improving scalability and work loads via the cloud. Virtualization and Cloud
  39. 39. 39© 2017 Delphix Corporation Data Virtualization On-Prem Dataset on source server Targets “Projects” thin copies via NFS
  40. 40. 40© 2017 Delphix Corporation Delphix engine is: • software appliance, (VM) • hosted on any hardware • tracks continual changes on regular intervals • Using native technology • Uses a source, (Dsource) to track changes • Can rewind changes to a flat file. • Uses little to no storage during life of vFile.
  41. 41. 41© 2017 Delphix Corporation Data Virtualization: linking to a source Source File Delphix Virtualization Engine 8TB storage Rsync (UNIX/Linux) Robocopy (Windows)
  42. 42. 42© 2017 Delphix Corporation Data Virtualization: provisioning a virtual database to a target Delphix Virtualization Engine 8 TB storage NFS iSCSI
  43. 43. 43© 2017 Delphix Corporation Provisioning a vFile to Target NFS Mounts Target NFS Mount Delphix Virtualization Engine 8 TB storage NFS iSCSI Target NFS Mount NFS iSCSI
  44. 44. 44© 2017 Delphix Corporation What if Changes are Made? Target NAS Mount Delphix Virtualization Engine 8 TB storage NFS iSCSI Target NAS Mount NFS iSCSI Target NAS Mount NFS iSCSI
  45. 45. 45© 2017 Delphix Corporation Data Virtualization: provisioning a virtual database to a target Target NAS Mount Delphix Virtualization Engine 8 TB storage NFS iSCSI Target NAS Mount NFS iSCSI Target NAS Mount NFS iSCSI Source NFS Mount Rsync (UNIX/Linux) Robocopy (Windows)
  46. 46. 46© 2017 Delphix Corporation Now “Containerize” for Ease of Delivery DevOps Delphix Virtualization Engine 8 TB storage NFS iSCSI Testing NFS iSCSI Reporting NFS iSCSI Create “Container” Rsync (UNIX/Linux) Robocopy (Windows)
  47. 47. 47© 2017 Delphix Corporation Robust and Full Read and Write Files • As many as development, test, reporting etc. require. • Delphix engine tracks the changes to each of the flat files. • Allows to create containers to isolate data sources, files, applications and other parts of big data environments and deliver as many as required per project.
  48. 48. 48© 2017 Delphix Corporation Moving to the Cloud with Standard Methods On-Prem NFS Mount Cloud Storage NFS Mount
  49. 49. 49© 2017 Delphix Corporation Standard Cloud Migration Limitations Even if only development or test has been migrated to the cloud…  Data is migrated, but this doesn’t count for ongoing data loads, application connectivity across the network.  Refreshes are time consuming and complex.  Often use archaic methods to refresh or replication is required.  Rarely are difference in cost structures taken into consideration in cloud migration projects from on-prem configurations.
  50. 50. 50© 2017 Delphix Corporation On-demand EC2 instances  Range from $0.0065/hr to $8.184/hr  Reserved instances provide discounts from 29% to 75% over on-demand Storage in Elastic Block Storage  EBS General Purpose SSD (gp2) volumes  $0.10 per GB-month of provisioned storage  EBS Provisioned IOPS SSD (io1) volumes  $0.125 per GB-month of provisioned storage  $0.065 per provisioned IOPS-month  EBS Throughput Optimized HDD (st1) volumes  $0.045 per GB-month of provisioned storage Data transfer in EC2  DATA IN from “internet” is free, but internally within EC2 can be $0.01/GB Cost Estimates for Cloud Vendors  Source: Sep 2016
  51. 51. 51© 2017 Delphix Corporation Virtualize Flat Files into the Cloud On Prem NFS Mount Delphix Virtualization Engine 8 TB storage NFS Mount on Cloud
  52. 52. 52© 2017 Delphix Corporation And From the Command Line Full and robust CLI allows for mass provisioning and scripting options that aren’t feasible with a GUI.  Manage environment  Simple and Complex Provisioning  User Management  Storage Management  DevOps automation for Jenkins, Chef and scripting automation.
  53. 53. 53© 2017 Delphix Corporation Set Environment: snapshot list database=Vvfiles_546 timeflow "dexample" timeflowRanges; Commit Delphix Command Line Examples Provision a new VDB and use the defaults: delphix> database provision After each command, note that the CLI relates the hierarchy: delphix database provision> defaults Using the defaults, an example is used for the deployment:
  54. 54. 54© 2017 Delphix Corporation delphix database provision *> set sourceConfig.type=OracleSIConfig delphix database provision *> set sourceConfig.databaseName=VEM_833 delphix database provision *> set sourceConfig.uniqueName=VEM_833 VDB Configuration
  55. 55. 55© 2017 Delphix Corporation Set the appropriate information just as you would to create a database deployment from the GUI: delphix database provision defaults *> set location=DEFAULT_SNAPSHOT delphix database provision *> set“Dev Copies" Setting Snapshots and Group
  56. 56. 56© 2017 Delphix Corporation All IaaS solutions provide encryption in-flight and encryption at-rest  But encryption doesn’t protect data as much as it needs to be . Europe already requires data masking, not just data encryption for any confindential data: 29/documentation/opinion-recommendation/files/2014/wp216_en.pdf Confidential data
  57. 57. 57© 2017 Delphix Corporation Encryption is Different than Masking/Obfuscation
  58. 58. 58© 2017 Delphix Corporation Data Masking
  59. 59. 59© 2017 Delphix Corporation GDPR and Our Future with Data Protection The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) is a regulation by which the European Parliament, the Council of the European Union and the European Commission intend to strengthen and unify data protection for all individuals within the European Union (EU).
  60. 60. 60© 2017 Delphix Corporation Big Data is often pulled from various data sources The ability to mask this flat file data has incredible potential. Capabilities to mask flat files is powerful and a reason for “Agile Masking”. Flat File Masking masking
  61. 61. 61© 2017 Delphix Corporation Agile Masking and File Formats • Multi-record • CSV • XML • Word • Excel • PowerPoint • Unstructured • EDI • Installation: Agile masking installed with valid license (file masking option) • Characteristics: Description and type of files
  62. 62. 62© 2017 Delphix Corporation Connectors and Agile Masking of Flat Files Free Text Redaction Algorithm – This algorithms masks or redacts free text columns of files. It uses either a Whitelist or Blacklist to determine what words are masked or not masked. This algorithm may require additional configuration to work in the manner you desire.
  63. 63. 63© 2017 Delphix Corporation vFiles, Masking, the Cloud- The Whole Picture Delphix Masking Engine Delphix Virtualization Engine 8 TB storage Delphix Virtualization Engine 8 TB storage UNIX Source 8 TB database Unix Target
  64. 64. 64© 2017 Delphix Corporation vFiles, Masking, the Cloud- The Whole Picture Delphix Masking Engine Delphix Virtualization Engine 8 TB storage Delphix Virtualization Engine 8 TB storage Unix Target Unix Source
  65. 65. 65© 2017 Delphix Corporation Summary: Solutions for Cloud Migrations using Virtualization • Review large data sets, both in legacy data sources and in structured, flat files for opportunities to be delivered to the cloud. • Consider virtualizing data sets to deter from latency issues. • “Containerize” environments for easy delivery of complex builds. • With GDPR and security a higher priority, consider masking non-production data/files, encrypting/securing production.
  66. 66. Thank you! Please fill out the session survey.