Big Data in the Cloud with Informatica Cloud and Amazon Redshift

  • 1,533 views
Uploaded on

Data warehousing costs have been continually rising with the explosion of Big Data. To help you explore the most cost-effective data warehousing techniques, learn from the cloud experts from Amazon …

Data warehousing costs have been continually rising with the explosion of Big Data. To help you explore the most cost-effective data warehousing techniques, learn from the cloud experts from Amazon and Informatica.

Learn more: http://www.informaticacloud.com/amazon-redshift

Amazon Redshift is a petabyte-scale cloud-based data warehouse that allows you to provision multiple database nodes on demand and offload raw data from on-premise databases for more cost effective data warehousing. Getting this data into Redshift is easy with Informatica Cloud. In this interactive webinar, you’ll learn:

-How Amazon Redshift is changing the economics of data warehousing
-Why Big Data integration and management is a strategic imperative within enterprises
-How cloud integration makes cloud data warehousing even more cost effective

At Informatica, our goal is to unlock your information potential. Join us with featured guest speakers from Amazon for this interactive webinar.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,533
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
87
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Announced RedshiftProvision multiple database nodes on demandStart large petabyte-scale data warehousing projects soonerOffload raw data from on-premise databases for cost effective processing
  • Use Amazon Redshift for easy scalabilityMigrate completely from existing DW to Amazon RedshiftAnalyze data that was previously too expensive to put into a DWDeploy Redshift because provisioning existing DW systems takes monthsReplace HIVE with Amazon Redshift if they were using HIVE to save money
  • Encryption enhancements
  • Airbnb: 5x – 20x reduction in query times; 4x reduction in cost over HIVEAccordant Media: 20x – 40x reduction in query timesMeteor Entertainment: Queries across millions of rows running in < 10sNokia: 50% reduction in costs, 2x improvement in query times
  • Queries across billions of rows running in < 1 min
  • Using Amazon Redshift to power its upcoming SkyVault productFully managed by Infor to enable customers to run business analyticsChose Redshift for performance, cost, ease-of-use, and scalability
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Informatica Cloud is powered by the Vibe, the same technology that powers the virtual data machine that runs the secure agent. Thus, you use Informatica Cloud to store the various metadata mappings, and upon run-time, the data moves directly from source to target through the execution of the Vibe Secure Agent.
  • Vibe is the industry’s first and only embeddable virtual data machine to access, aggregate and manage data – regardless of data type, source, volume, compute platform or user. It lets you map once, and deploy anywhere. So you can take your logic that may have defined on-premise, then move it to the cloud. And then move it to Hadoop, or embed it in an application– without recoding.This makes your architecture faster, more flexible, and futureproof.Business BenefitFive time faster turn-around from business idea to solutionAdapt the technology to your business, not vice-versaUtilize all your data, regardless of location, type or volumeIT BenefitFive times faster project deliveryEliminate skills gaps for adopting new technologies and approachesReduce cost of maintaining complex assortment of technologies

Transcript

  • 1. Cloud and Amazon Redshift Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift
  • 2. Today’s Agenda • Informatica and Amazon Strategic Partnership • Amazon Redshift Overview • Informatica Cloud Redshift Connector • Demonstration • Discussion • Next Steps 2
  • 3. Informatica: The Information Management Leader B2B Data Exchange Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration. Cloud Data IntegrationEnterprise Data Integration Complex Event Processing Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint. Ultra Messaging In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market. Data Quality Master Data Management Application ILM
  • 4. Informatica Cloud: our fastest growing product line Today’s Focus: Cloud Data Integration 4
  • 5. Informatica Cloud and Amazon Redshift: Enabling cost-effective data warehousing • Redshift Connector pre-release announced in February • General availability this month (August) 5 InformaticaCloud.com/Amazon-Redshift
  • 6. Rahul Pathak | rapathak@amazon.com | @rahulpathak Senior Product Manager Amazon Redshift
  • 7. AWS Database Services Amazon RDS Fully managed SQL database service for OLTP workloads Amazon DynamoDB Fully managed NoSQL service for massively scalable, high throughput, low latency workloads Amazon Redshift Fully managed fast and powerful, petabyte- scale data warehouse service Amazon ElastiCache Fully managed Memcached-compliant in memory caching service
  • 8. We set out to build… A fast and powerful, petabyte-scale data warehouse that is: A Lot Faster A Lot Cheaper A Lot Simpler Amazon Redshift
  • 9. Data warehousing done the AWS way • Pay as you go, no up front costs • Fast, cheap, easy to use • SQL • Easy to provision
  • 10. Common Customer Use Cases • Reduce costs by extending DW rather than adding HW • Migrate completely from existing DW systems • Respond faster to business; provision in minutes • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools • Add analytic functionality to applications • Scale DW capacity as demand grows • Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 11. Progress Since Launch on Feb 14, 2013 • Fastest growing service in AWS history • Well over 1,000 customers; adding over 100 per week • Obtained SOC1 & SOC2 certification with more in progress • Deployed in US East (N. Virginia), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo) • Additional global regions coming soon
  • 12. Amazon Redshift Customers • 5x – 20x reduction in query times; 4x cost reduction over HIVE • 20x – 40x reduction in query times • Nokia: 50% reduction in costs, 2x improvement in query times
  • 13. Amazon Redshift Customer: bit.ly “When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes – if not seconds.” - Sean O’Connor, Engineer at bit.ly Bit.ly provides social link sharing analytics, managing over 300 million shortens and 5 billion clicks each month
  • 14. 14 Amazon Redshift Customer: HasOffers “Amazon Redshift introduces a major opportunity to improve the performance of our real- time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution.” - Niek Sanders, VP of Engineering, HasOffers HasOffers records and reports billions of desktop and mobile interactions for performance marketers
  • 15. Amazon Redshift Customer: Infor “This is the formula for fast and broad adoption, where customers can get consistent, accurate, and useful data fast - in weeks not months or years.” - Ali Shadman, SVP, Business Cloud & Upgrades, Infor Infor is the world’s third largest ERP vendor, serving over 70,000 customers in 194 countries
  • 16. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • With row storage you do unnecessary I/O • To get total amount, you have to read everything
  • 17. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes • With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 18. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Columnar compression saves space & reduces I/O • Amazon Redshift analyzes and compresses your data analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 19. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Direct-attached storage • Large data block sizes • Track of the minimum and maximum value for each block • Skip over blocks that don’t contain the data needed for a given query • Minimize unnecessary I/O
  • 20. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Use direct-attached storage to maximize throughput • Hardware optimized for high performance data processing • Large block sizes to make the most of each read • Amazon Redshift manages durability for you
  • 21. Amazon Redshift architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Single node version available 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 22. Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage • Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2
  • 23. Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB) Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) Note: Nodes not to scale
  • 24. Amazon Redshift is priced to let you analyze all your data Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999
  • 25. Amazon Redshift is easy to use • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups Slides not intended for redistribution.
  • 26. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted • No direct access to compute nodes • Amazon VPC support Slides not intended for redistribution. 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal Security Group JDBC/ODBC
  • 27. Amazon Redshift continuously backs up your data and recovers from failures • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region Slides not intended for redistribution.
  • 28. Amazon Redshift works with your existing analysis tools More coming soon… JDBC/ODBC Amazon Redshift
  • 29. Amazon Redshift integrates with multiple data sources Amazon Elastic MapReduce Amazon DynamoDB Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway Service Amazon Simple Storage Service (S3) Corporate Data Center Amazon Relational Database Service (RDS) Amazon Redshift
  • 30. Today’s Agenda • Informatica and Amazon Strategic Partnership • Amazon Redshift Overview • Informatica Cloud Redshift Connector • Demonstration • Discussion • Next Steps 30
  • 31. 2 1 Informatica Cloud Architecture Overview 4Secure Agent Your Company 3 Marketplace Amazon Redshift
  • 32. Map Once. Deploy Anywhere. ON PREMISE HADOOP 3rd PARTY APPLICATIONS CLOUD
  • 33. Cloud Amazon Redshift Connector Demo Nicolas Brisoux, Cloud Platform Adoption
  • 34. Best practices to remember… • The Amazon S3 bucket that holds the data files must be created in the same region as your cluster • Files are deleted from Amazon S3 bucket when upload is complete • Choose a batch size where the number of batches matches the number of slices in your cluster • Each XL node has 2 slices, each 8XL node has 16 • If you have a 2 node XL cluster and 40,000 rows of data, choose a batch size of 10,000 • The Informatica Cloud Redshift connector can maximize Amazon’s parallel processing capabilities this way
  • 35. Informatica Cloud Amazon Redshift demonstration Firewall Informatica Cloud Secure Agent Metadata Mappings Authenticate and retrieve Data Synchronization Task 1 1 Retrieve Account Data2 2 3 Perform lookup on SLA level 3 4 4 Put Account Data & SLA Level into Flat File 5 Transferred compressed Flat File 5 6 Initiate load from Amazon S3 6 7 Load data into Amazon Redshift 7
  • 36. PowerCenter Mappings and Informatica Cloud • If you want to reuse your existing PowerCenter mappings with Informatica Cloud and Redshift you have 2 options: • Use the PowerCenter Repository Manager to export your existing workflows and import them into Informatica Cloud using the PowerCenter Tasks feature Or… • Keep your existing mappings in PowerCenter and stage the data • Create a DSS task in Informatica Cloud to move the data to Redshift from the staging area • This task can be managed from PowerCenter 1 2
  • 37. Why Informatica Cloud Integration for Redshift? 37 1 Map Once, Deploy Anywhere 2 Rapid Connectivity & Deployment 3 Advanced Integration Delivered Easily 4 Excellence in batch and real-time integration InformaticaCloud.com
  • 38. Next Steps • Get started with Amazon Redshift • Get started with Informatica Cloud • InformaticaCloud.com • Learn more about our Redshift Connector • InformaticaCloud.com/Amazon-Redshift 38
  • 39. Discussion Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift InformaticaCloud.com