Cloud Computing Meets Data Warehousing


Published on

Introduction to data warehousing and analytics in the cloud.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • About Omer: Developed Vertica’s internal cloud solution for use with PoC Created the Vertica for the Cloud offering based on Amazon EC2 Heading up Vertica’s cloud and virtualization initiatives Define cloud: available, scalable and efficient…usually managed by someone else Available means anyone can get resources on demand Scalable means you get as much or as little as you need Efficient means you can get resources quickly and in bit sized chunks
  • Cloud Computing Meets Data Warehousing

    1. 1. Cloud Computing Meets Data Warehousing Omer Trajman Sr. Dir. for Cloud and Virtualization Vertica Systems [email_address]
    2. 2. What is….Cloud? <ul><li>What are Cloud Services? </li></ul><ul><li>Other Peoples’ Software </li></ul><ul><li>What are Cloud Platforms? </li></ul><ul><li>Other Peoples’ Frameworks </li></ul><ul><li>What is Cloud Infrastructure? </li></ul><ul><li>Other Peoples’ Hardware </li></ul>
    3. 3. Data Warehousing in the Cloud <ul><li>Applications, e.g. Birst, LogiXML, Lucidera </li></ul><ul><ul><li>Web app providing data services </li></ul></ul><ul><ul><li>Analytic SaaS – Full stack solution </li></ul></ul><ul><li>Platforms, e.g. Google AppEngine, MSFT Azure </li></ul><ul><ul><li>Programming API (Java, Python, .Net) </li></ul></ul><ul><ul><li>Integrated data access </li></ul></ul><ul><li>Infrastructure, e.g. Amazon Web Services </li></ul><ul><ul><li>Favorite OS on demand (Linux, Solaris, Windows) </li></ul></ul><ul><ul><li>Additional services (Simple DB, Queue, Storage) </li></ul></ul>
    4. 4. Security is a Tradeoff <ul><li>“ Security costs money, but it also costs in time, convenience, capabilities,… ” </li></ul><ul><li>-Bruce Schneier </li></ul><ul><li>Assess how important it is to secure your data </li></ul><ul><li>What are the risks with in-house and cloud? </li></ul><ul><li>Why not keep it under your mattress ? </li></ul>
    5. 5. Full Stack Offerings <ul><li>Birst </li></ul><ul><ul><li>Hosted data access </li></ul></ul><ul><ul><li>Spreadsheet in the Cloud </li></ul></ul><ul><li>LogiXML </li></ul><ul><ul><li>Framework for online Analytic SaaS </li></ul></ul><ul><ul><li>Reporting, Dashboard, ETL </li></ul></ul><ul><li>Lucidera </li></ul><ul><ul><li>Vertical apps focused </li></ul></ul><ul><ul><li>Analysis Tools (e.g. Pipeline Healthcheck) </li></ul></ul>
    6. 6. DIY Analytics in the Cloud <ul><li>Google AppEngine, Microsoft Azure </li></ul><ul><ul><li>Java, Python, .Net driven UI to capture and display </li></ul></ul><ul><ul><li>GQL or SQL to query (limited joins, no aggs) </li></ul></ul><ul><ul><li>Use client tool to upload app </li></ul></ul><ul><li>Amazon Ecosystems </li></ul><ul><ul><li>Provision via API or service e.g. RightScale </li></ul></ul><ul><ul><li>Pick your UI and ETL – Jasper, Pentaho , etc. </li></ul></ul><ul><ul><li>Pick your DB – MySql, Oracle, SQLServer, Vertica </li></ul></ul>
    7. 7. Key Questions <ul><li>Is my data safe and secure? </li></ul><ul><li>Can I get fast access to my data in the cloud? </li></ul><ul><li>What does this cost? </li></ul><ul><li>Do I need a detailed plan for growth? </li></ul><ul><li>How much IT do I need? </li></ul>
    8. 8. Securing the Cloud <ul><li>Create a VPN </li></ul><ul><li>Firewall the host </li></ul><ul><li>Encrypt the disk </li></ul><ul><li>Consider where to keep sensitive data </li></ul>
    9. 9. Upload and Beyond <ul><li>PUT it, one object at a time </li></ul><ul><li>Web page upload…a few MB at a time </li></ul><ul><li>Bulk upload via FTP, SCP at 1+GB/ hour </li></ul><ul><li>Use an ETL Tool </li></ul><ul><li>Is data already in the cloud? </li></ul>
    10. 10. Economics of DW in the Cloud <ul><li>Getting and keeping data in the cloud </li></ul><ul><ul><li>Cloud applications are a source of data </li></ul></ul><ul><ul><li>Upload services (sneakernet) </li></ul></ul><ul><li>Scale on demand </li></ul><ul><ul><li>Transparent scaling </li></ul></ul><ul><ul><li>Managed or API based scaling </li></ul></ul><ul><li>Pay as you go </li></ul><ul><ul><li>Operational cost by resource or volume </li></ul></ul><ul><ul><li>Incremental changes on demand </li></ul></ul>
    11. 11. Get Set… <ul><li>Check out Analytic SaaS Vendors </li></ul><ul><ul><li>Birst, LogiXML, Lucidera </li></ul></ul><ul><li>Are you a coder? Look for a Framework </li></ul><ul><ul><li>Google AppEngine, MSFT Azure, Elastic MR </li></ul></ul><ul><li>Looking for Classic BI ? </li></ul><ul><ul><li>Best of breed on the cloud </li></ul></ul><ul><ul><li>Amazon, GoGrid, RightScale, Joyent </li></ul></ul><ul><li>Questions? [email_address] </li></ul>