Cloud and Amazon Redshift
Rahul Pathak, Amazon Redshift Product Management
Nicolas Brisoux, Informatica Cloud Platform Ado...
Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Conn...
Informatica: The Information Management Leader
B2B Data Exchange
Informatica supports the
requirements of cross-organizati...
Informatica Cloud: our fastest growing product line
Today’s Focus: Cloud Data Integration
4
Informatica Cloud and Amazon Redshift:
Enabling cost-effective data warehousing
• Redshift Connector pre-release announced...
Rahul Pathak | rapathak@amazon.com | @rahulpathak
Senior Product Manager
Amazon Redshift
AWS Database Services
Amazon RDS
Fully managed SQL database service for OLTP
workloads
Amazon
DynamoDB
Fully managed NoSQL...
We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
A Lot Faster
A Lot Cheaper
A Lot Simpler
...
Data warehousing done the AWS way
• Pay as you go, no up front costs
• Fast, cheap, easy to use
• SQL
• Easy to provision
Common Customer Use Cases
• Reduce costs by
extending DW rather than
adding HW
• Migrate completely from
existing DW syste...
Progress Since Launch on Feb 14, 2013
• Fastest growing service in AWS history
• Well over 1,000 customers; adding over 10...
Amazon Redshift Customers
• 5x – 20x reduction in query times; 4x cost reduction over HIVE
• 20x – 40x reduction in query ...
Amazon Redshift Customer: bit.ly
“When we want to answer a
question with Redshift, we
just write a SQL query and
get an an...
14
Amazon Redshift Customer: HasOffers
“Amazon Redshift introduces a
major opportunity to improve
the performance of our r...
Amazon Redshift Customer: Infor
“This is the formula for fast and broad
adoption, where customers can get
consistent, accu...
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes...
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes...
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large...
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block ...
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large...
Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
...
Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB...
Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Single No...
Amazon Redshift is priced to let you analyze all your data
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Le...
Amazon Redshift is easy to use
• Provision in minutes
• Monitor query
performance
• Point and click resize
• Built in secu...
Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardw...
Amazon Redshift continuously backs up your data and
recovers from failures
• Replication within the cluster and backup to ...
Amazon Redshift works with your existing analysis tools
More coming soon…
JDBC/ODBC
Amazon Redshift
Amazon Redshift integrates with multiple data sources
Amazon Elastic
MapReduce
Amazon
DynamoDB
Amazon Elastic
Compute Clou...
Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Conn...
2
1
Informatica Cloud Architecture Overview
4Secure
Agent
Your Company 3
Marketplace
Amazon
Redshift
Map Once. Deploy Anywhere.
ON PREMISE HADOOP 3rd PARTY
APPLICATIONS
CLOUD
Cloud Amazon Redshift
Connector Demo
Nicolas Brisoux, Cloud Platform Adoption
Best practices to remember…
• The Amazon S3 bucket that holds the data files must be
created in the same region as your cl...
Informatica Cloud Amazon Redshift demonstration
Firewall
Informatica Cloud
Secure Agent
Metadata Mappings
Authenticate and...
PowerCenter Mappings and Informatica Cloud
• If you want to reuse your existing PowerCenter mappings
with Informatica Clou...
Why Informatica Cloud Integration for Redshift?
37
1 Map Once, Deploy Anywhere
2 Rapid Connectivity & Deployment
3 Advance...
Next Steps
• Get started with Amazon Redshift
• Get started with Informatica Cloud
• InformaticaCloud.com
• Learn more abo...
Discussion
Rahul Pathak, Amazon Redshift Product Management
Nicolas Brisoux, Informatica Cloud Platform Adoption
Darren Cu...
Big Data in the Cloud with Informatica Cloud and Amazon Redshift
Upcoming SlideShare
Loading in...5
×

Big Data in the Cloud with Informatica Cloud and Amazon Redshift

2,002

Published on

Data warehousing costs have been continually rising with the explosion of Big Data. To help you explore the most cost-effective data warehousing techniques, learn from the cloud experts from Amazon and Informatica.

Learn more: http://www.informaticacloud.com/amazon-redshift

Amazon Redshift is a petabyte-scale cloud-based data warehouse that allows you to provision multiple database nodes on demand and offload raw data from on-premise databases for more cost effective data warehousing. Getting this data into Redshift is easy with Informatica Cloud. In this interactive webinar, you’ll learn:

-How Amazon Redshift is changing the economics of data warehousing
-Why Big Data integration and management is a strategic imperative within enterprises
-How cloud integration makes cloud data warehousing even more cost effective

At Informatica, our goal is to unlock your information potential. Join us with featured guest speakers from Amazon for this interactive webinar.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,002
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
100
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Announced RedshiftProvision multiple database nodes on demandStart large petabyte-scale data warehousing projects soonerOffload raw data from on-premise databases for cost effective processing
  • Use Amazon Redshift for easy scalabilityMigrate completely from existing DW to Amazon RedshiftAnalyze data that was previously too expensive to put into a DWDeploy Redshift because provisioning existing DW systems takes monthsReplace HIVE with Amazon Redshift if they were using HIVE to save money
  • Encryption enhancements
  • Airbnb: 5x – 20x reduction in query times; 4x reduction in cost over HIVEAccordant Media: 20x – 40x reduction in query timesMeteor Entertainment: Queries across millions of rows running in < 10sNokia: 50% reduction in costs, 2x improvement in query times
  • Queries across billions of rows running in < 1 min
  • Using Amazon Redshift to power its upcoming SkyVault productFully managed by Infor to enable customers to run business analyticsChose Redshift for performance, cost, ease-of-use, and scalability
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Read only the data you need
  • Informatica Cloud is powered by the Vibe, the same technology that powers the virtual data machine that runs the secure agent. Thus, you use Informatica Cloud to store the various metadata mappings, and upon run-time, the data moves directly from source to target through the execution of the Vibe Secure Agent.
  • Vibe is the industry’s first and only embeddable virtual data machine to access, aggregate and manage data – regardless of data type, source, volume, compute platform or user. It lets you map once, and deploy anywhere. So you can take your logic that may have defined on-premise, then move it to the cloud. And then move it to Hadoop, or embed it in an application– without recoding.This makes your architecture faster, more flexible, and futureproof.Business BenefitFive time faster turn-around from business idea to solutionAdapt the technology to your business, not vice-versaUtilize all your data, regardless of location, type or volumeIT BenefitFive times faster project deliveryEliminate skills gaps for adopting new technologies and approachesReduce cost of maintaining complex assortment of technologies
  • Transcript of "Big Data in the Cloud with Informatica Cloud and Amazon Redshift"

    1. 1. Cloud and Amazon Redshift Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift
    2. 2. Today’s Agenda • Informatica and Amazon Strategic Partnership • Amazon Redshift Overview • Informatica Cloud Redshift Connector • Demonstration • Discussion • Next Steps 2
    3. 3. Informatica: The Information Management Leader B2B Data Exchange Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration. Cloud Data IntegrationEnterprise Data Integration Complex Event Processing Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint. Ultra Messaging In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market. Data Quality Master Data Management Application ILM
    4. 4. Informatica Cloud: our fastest growing product line Today’s Focus: Cloud Data Integration 4
    5. 5. Informatica Cloud and Amazon Redshift: Enabling cost-effective data warehousing • Redshift Connector pre-release announced in February • General availability this month (August) 5 InformaticaCloud.com/Amazon-Redshift
    6. 6. Rahul Pathak | rapathak@amazon.com | @rahulpathak Senior Product Manager Amazon Redshift
    7. 7. AWS Database Services Amazon RDS Fully managed SQL database service for OLTP workloads Amazon DynamoDB Fully managed NoSQL service for massively scalable, high throughput, low latency workloads Amazon Redshift Fully managed fast and powerful, petabyte- scale data warehouse service Amazon ElastiCache Fully managed Memcached-compliant in memory caching service
    8. 8. We set out to build… A fast and powerful, petabyte-scale data warehouse that is: A Lot Faster A Lot Cheaper A Lot Simpler Amazon Redshift
    9. 9. Data warehousing done the AWS way • Pay as you go, no up front costs • Fast, cheap, easy to use • SQL • Easy to provision
    10. 10. Common Customer Use Cases • Reduce costs by extending DW rather than adding HW • Migrate completely from existing DW systems • Respond faster to business; provision in minutes • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools • Add analytic functionality to applications • Scale DW capacity as demand grows • Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
    11. 11. Progress Since Launch on Feb 14, 2013 • Fastest growing service in AWS history • Well over 1,000 customers; adding over 100 per week • Obtained SOC1 & SOC2 certification with more in progress • Deployed in US East (N. Virginia), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo) • Additional global regions coming soon
    12. 12. Amazon Redshift Customers • 5x – 20x reduction in query times; 4x cost reduction over HIVE • 20x – 40x reduction in query times • Nokia: 50% reduction in costs, 2x improvement in query times
    13. 13. Amazon Redshift Customer: bit.ly “When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes – if not seconds.” - Sean O’Connor, Engineer at bit.ly Bit.ly provides social link sharing analytics, managing over 300 million shortens and 5 billion clicks each month
    14. 14. 14 Amazon Redshift Customer: HasOffers “Amazon Redshift introduces a major opportunity to improve the performance of our real- time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution.” - Niek Sanders, VP of Engineering, HasOffers HasOffers records and reports billions of desktop and mobile interactions for performance marketers
    15. 15. Amazon Redshift Customer: Infor “This is the formula for fast and broad adoption, where customers can get consistent, accurate, and useful data fast - in weeks not months or years.” - Ali Shadman, SVP, Business Cloud & Upgrades, Infor Infor is the world’s third largest ERP vendor, serving over 70,000 customers in 194 countries
    16. 16. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • With row storage you do unnecessary I/O • To get total amount, you have to read everything
    17. 17. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes • With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
    18. 18. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Columnar compression saves space & reduces I/O • Amazon Redshift analyzes and compresses your data analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
    19. 19. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Direct-attached storage • Large data block sizes • Track of the minimum and maximum value for each block • Skip over blocks that don’t contain the data needed for a given query • Minimize unnecessary I/O
    20. 20. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Use direct-attached storage to maximize throughput • Hardware optimized for high performance data processing • Large block sizes to make the most of each read • Amazon Redshift manages durability for you
    21. 21. Amazon Redshift architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Single node version available 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
    22. 22. Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage • Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2
    23. 23. Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB) Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) Note: Nodes not to scale
    24. 24. Amazon Redshift is priced to let you analyze all your data Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999
    25. 25. Amazon Redshift is easy to use • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups Slides not intended for redistribution.
    26. 26. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted • No direct access to compute nodes • Amazon VPC support Slides not intended for redistribution. 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal Security Group JDBC/ODBC
    27. 27. Amazon Redshift continuously backs up your data and recovers from failures • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region Slides not intended for redistribution.
    28. 28. Amazon Redshift works with your existing analysis tools More coming soon… JDBC/ODBC Amazon Redshift
    29. 29. Amazon Redshift integrates with multiple data sources Amazon Elastic MapReduce Amazon DynamoDB Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway Service Amazon Simple Storage Service (S3) Corporate Data Center Amazon Relational Database Service (RDS) Amazon Redshift
    30. 30. Today’s Agenda • Informatica and Amazon Strategic Partnership • Amazon Redshift Overview • Informatica Cloud Redshift Connector • Demonstration • Discussion • Next Steps 30
    31. 31. 2 1 Informatica Cloud Architecture Overview 4Secure Agent Your Company 3 Marketplace Amazon Redshift
    32. 32. Map Once. Deploy Anywhere. ON PREMISE HADOOP 3rd PARTY APPLICATIONS CLOUD
    33. 33. Cloud Amazon Redshift Connector Demo Nicolas Brisoux, Cloud Platform Adoption
    34. 34. Best practices to remember… • The Amazon S3 bucket that holds the data files must be created in the same region as your cluster • Files are deleted from Amazon S3 bucket when upload is complete • Choose a batch size where the number of batches matches the number of slices in your cluster • Each XL node has 2 slices, each 8XL node has 16 • If you have a 2 node XL cluster and 40,000 rows of data, choose a batch size of 10,000 • The Informatica Cloud Redshift connector can maximize Amazon’s parallel processing capabilities this way
    35. 35. Informatica Cloud Amazon Redshift demonstration Firewall Informatica Cloud Secure Agent Metadata Mappings Authenticate and retrieve Data Synchronization Task 1 1 Retrieve Account Data2 2 3 Perform lookup on SLA level 3 4 4 Put Account Data & SLA Level into Flat File 5 Transferred compressed Flat File 5 6 Initiate load from Amazon S3 6 7 Load data into Amazon Redshift 7
    36. 36. PowerCenter Mappings and Informatica Cloud • If you want to reuse your existing PowerCenter mappings with Informatica Cloud and Redshift you have 2 options: • Use the PowerCenter Repository Manager to export your existing workflows and import them into Informatica Cloud using the PowerCenter Tasks feature Or… • Keep your existing mappings in PowerCenter and stage the data • Create a DSS task in Informatica Cloud to move the data to Redshift from the staging area • This task can be managed from PowerCenter 1 2
    37. 37. Why Informatica Cloud Integration for Redshift? 37 1 Map Once, Deploy Anywhere 2 Rapid Connectivity & Deployment 3 Advanced Integration Delivered Easily 4 Excellence in batch and real-time integration InformaticaCloud.com
    38. 38. Next Steps • Get started with Amazon Redshift • Get started with Informatica Cloud • InformaticaCloud.com • Learn more about our Redshift Connector • InformaticaCloud.com/Amazon-Redshift 38
    39. 39. Discussion Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift InformaticaCloud.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×