AWS Customer Presentation - JovianDATA

•

2 likes•4,122 views

Amazon Web Services

Satya and Anupam, discuss how JovianData uses AWS and share some lessons learned, AWS Startup Tour - SV - 2010

Technology

Analytics at the Speed of Thought 2460 North First Street, Suite 170, San Jose, CA 95131  408-433-9383  www.joviandata.com

Why move to the cloud? Considering AWS actively but not sure about ,[object Object]

Application provisioning challenges,[object Object]

Transforming Data to Actionable Insights Campaign Heat Map Time Fully Materialized Data Cube ,[object Object]

Multi-dimensional partitionsEngagement High Geography Medium Low Publishers

Agenda JovianDATA Company Overview JovianInsights – The Power of Analytics JovianDATA Cube Storage Innovations in Advanced Analytics using commodity clusters Analytics Lifecycle Management Innovations in Cloud Infrastructure Management

Avoiding Expensive Data Processing Usage based Automatic View Materialization Avoid Network I/O Multi-Dimensional Partitioning Reduce Disk I/O By Materializing Expensive Groups

Agenda Reducing Capex Dynamic Provisioning Application Isolation

Managing CapEx with Role Based Clusters SINGLE CLUSTER FOR DATA CLEANSING, LOAD AND QUERY 15TB 100 NODES Monthly Cost = $28,800

Managing Cap-Ex with Role Based Clusters UI Ad Server Data, Search Engine Data 2 hours daily for load on 10 nodes Query on 5 nodes Monthly Cost = $2,052 DATA CLEANSING QUERY LOAD MODEL HIBERNATE MODEL

Selective Replication for on demand perf ,[object Object]

Adds two new temporary nodes (Temp1, Temp2)

What's hot

Thinkbox SoftwareAmazon Web Services

#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson

Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...DevOps.com

Introduction to IAC and Terraform Venkat NaveenKashyap Devulapally

Cost Optimization as Major Architectural Consideration for Cloud ApplicationUdayan Banerjee

GCS' Private Cloud Analysisjoegleinser

Save Azure CostKarthikeyan VK

JGI / AMI - Structure of a genomic specific Amazon Machine ImageJeremy Brand

Blue green deploymentLucas Falk Beier

Kubernetes: Reducing Infrastructure Cost & ComplexityDevOps.com

Cloud Costing Services InnoTech

Windows Azure Zero Downtime UpgradePavel Revenkov

Masterclass Webinar - Amazon Elastic Compute Cloud (EC2)Amazon Web Services

Industrial Light & MagicAmazon Web Services

TerraformOtto Jongerius

Paving The Way To The Hybrid CloudPT Datacomm Diangraha

Datacomm VMWare Hybrid CloudPT Datacomm Diangraha

Cost Optimisation on AWSAmazon Web Services

Modest scale HPC on Azure using CGYROIgor Sfiligoi

AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...Amazon Web Services

What's hot (20)

Thinkbox Software

#lspe Q1 2013 dynamically scaling netflix in the cloud

Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...

Introduction to IAC and Terraform

Cost Optimization as Major Architectural Consideration for Cloud Application

GCS' Private Cloud Analysis

Save Azure Cost

JGI / AMI - Structure of a genomic specific Amazon Machine Image

Blue green deployment

Kubernetes: Reducing Infrastructure Cost & Complexity

Cloud Costing Services

Windows Azure Zero Downtime Upgrade

Masterclass Webinar - Amazon Elastic Compute Cloud (EC2)

Industrial Light & Magic

Terraform

Paving The Way To The Hybrid Cloud

Datacomm VMWare Hybrid Cloud

Cost Optimisation on AWS

Modest scale HPC on Azure using CGYRO

AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...

Viewers also liked

AWS Customer Presentation - Adobe's Foray into the CloudAmazon Web Services

AWS Customer Presentation - AutodeskAmazon Web Services

AWS Customer Presentation- MelroseAmazon Web Services

AWS Partner Presentation - SAP Amazon Web Services

Migrating Entire Enterprise ITAmazon Web Services

(MBL305) The World Cup Second Screen Experience | AWS re:Invent 2014Amazon Web Services

AWS Webcast - Sumo LogicAmazon Web Services

(APP311) Lessons Learned From Over a Decade of Deployments at Amazon | AWS re...Amazon Web Services

(BDT206) See How Amazon Redshift is Powering Business Intelligence in the Ent...Amazon Web Services

AWS Customer Presentation - Admeld Amazon Web Services

AWS Customer Presentation - Skipso Amazon Web Services

AWS Customer Presentation - Mind TouchAmazon Web Services

AWS Customer Presentation - WeoGeo Amazon Web Services

(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014Amazon Web Services

AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...Amazon Web Services

AWS Paris Summit 2014 - T2 - Optimisation du coût total de possession avec AWSAmazon Web Services

Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14Amazon Web Services

Intro to Amazon Web Services at Edinburgh Startup EventAmazon Web Services

DevOps for the Enterprise: Automated Testing and Monitoring Amazon Web Services

(GAM201) Scalable Game Architectures That Don't Break the Bank | AWS re:Inven...Amazon Web Services

Viewers also liked (20)

AWS Customer Presentation - Adobe's Foray into the Cloud

AWS Customer Presentation - Autodesk

AWS Customer Presentation- Melrose

AWS Partner Presentation - SAP

Migrating Entire Enterprise IT

(MBL305) The World Cup Second Screen Experience | AWS re:Invent 2014

AWS Webcast - Sumo Logic

(APP311) Lessons Learned From Over a Decade of Deployments at Amazon | AWS re...

(BDT206) See How Amazon Redshift is Powering Business Intelligence in the Ent...

AWS Customer Presentation - Admeld

AWS Customer Presentation - Skipso

AWS Customer Presentation - Mind Touch

AWS Customer Presentation - WeoGeo

(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014

AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...

AWS Paris Summit 2014 - T2 - Optimisation du coût total de possession avec AWS

Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14

Intro to Amazon Web Services at Edinburgh Startup Event

DevOps for the Enterprise: Automated Testing and Monitoring

(GAM201) Scalable Game Architectures That Don't Break the Bank | AWS re:Inven...

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Scanning the Internet for External Cloud Exposures via SSL Certs

Artificial intelligence in the post-deep learning era

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Unlocking the Potential of the Cloud for IBM Power Systems

Advanced Test Driven-Development @ php[tek] 2024

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Streamlining Python Development: A Guide to a Modern Project Setup

Connect Wave/ connectwave Pitch Deck Presentation

The transition to renewables in India.pdf

Unleash Your Potential - Namagunga Girls Coding Club

costume and set research powerpoint presentation

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

AWS Customer Presentation - JovianDATA

1. Analytics at the Speed of Thought 2460 North First Street, Suite 170, San Jose, CA 95131  408-433-9383  www.joviandata.com

2. JovianDATA Mission Technology platform to optimize your conversion funnel at the lowest cost

4. Current stack’s cloud readiness

7. Multi-dimensional indexes

8. Multi-dimensional partitionsEngagement High Geography Medium Low Publishers

9. Agenda JovianDATA Company Overview JovianInsights – The Power of Analytics JovianDATA Cube Storage Innovations in Advanced Analytics using commodity clusters Analytics Lifecycle Management Innovations in Cloud Infrastructure Management

10. Avoiding Expensive Data Processing Usage based Automatic View Materialization Avoid Network I/O Multi-Dimensional Partitioning Reduce Disk I/O By Materializing Expensive Groups

11. Why move to the cloud?

12. Agenda Reducing Capex Dynamic Provisioning Application Isolation

13. Managing CapEx with Role Based Clusters SINGLE CLUSTER FOR DATA CLEANSING, LOAD AND QUERY 15TB 100 NODES Monthly Cost = $28,800

14. Managing Cap-Ex with Role Based Clusters UI Ad Server Data, Search Engine Data 2 hours daily for load on 10 nodes Query on 5 nodes Monthly Cost = $2,052 DATA CLEANSING QUERY LOAD MODEL HIBERNATE MODEL

15. Agenda Reducing Capex Dynamic Provisioning Application Isolation

16.

17. Solution

18. FlexRestoreTM

19. Adds two new temporary nodes (Temp1, Temp2)

20. Creates new replicas for hot partitions and redistributes across nodesWith Replication Factor = 1 Site Section Analytics = 10 minutes Node1 Node2 Node3 Node4 P34 P12 P1 P1 P22 P3 P3 P12 With Replication Factor = 10 Site Section Analytics = 30 seconds P12 P22 P34 P22 Nodeset1 P3 P34 P1 P3 Temp1 Temp2 P1 P34 P22 P12

21.

22. Benefits

23. High performance computing power when you need it

24. But only when you need it to hold down operating costsNode1 Node2 Node3 Node4 P34 P1 P12 P1 P22 P3 P3 P12 P12 P22 P34 P22  Nodeset1 P3 P34 P1 P3  Temp1 Temp2  P34 P1 P22 P12

25. Agenda Reducing Capex Dynamic Provisioning Application Isolation

26. Provision Tera Scale Applications in Minutes Without Application Isolation Data for all advertisers is kept ‘live’ on 50 nodes Campaign Manager needs to run heavy duty reports for a Big Advertiser 50 live nodes per month = $14, 400 FUNNEL ANALYSIS FOR CLIENT

27. Provision Tera Scale Applications in Minutes Application is provisioned in parallel from S3/EBS into EC2 Campaign Manager requests Application Provisioning for a Specific Advertiser 50 nodes for fortnightly analysis = $320 FUNNEL ANALYSIS FOR CLIENT HIBERNATED MODEL

28. Summary Reducing CapEx with Role based Temporary Clusters on EC2 10x Cost Savings with EC2 usage Dynamic Provisioning with Selective Replication on EC2 10x Performance on EC2 replication Application Isolation with Application Hibernation on S3/EBS 100x Cost Savings with EC2-S3

29. Thank You

Editor's Notes

JovianDATA’s mission is to provide a technology platform to help users optimize the entire digital marketing funnel at the lowest cost. <next>
At its core, JovianDATA solves the “Analytics on large data” problem. Our customers have huge amounts of digital data and they had 3 central challenges while trying to analyze itHuge upfront and ongoing CapexHuge Maintenance and over provisionisgLack of application richness All of then want to move to the cloud <next>They are not sure aboutCapEx benefitsReadiness for the cloud to support complex application stacks typical of such installations which leads to second order problemApplication Provisioning challenges <next>
We believe a fully integrated stack built on the cloud using sophisticated distributed technology with commodity components is key to high performance low cost solution for tackling large data. AWS’s rich cloud functionality and JovianDATA distributed technology makes analytics on large data on a cloud possible. We can takeImpression data Site data And marry them with 3rd party data & sales data thereby providing a unified correlated and sophisticated analysis to various players in the ad ecosystem.<next>
Let me present a brief overview of the system, before we go into the details later in the presentation JoviandataSaaS system takes data, provides rich analytics and handles everything in-betweenA distributed ETL layer built on top of Java & Mysql takes raw data and applies single event filters transformations.The data is then loaded into our massively parallel warehouse, where more complex rules which look at all the data (as opposed to single rows) are applied. We also collect statistics here which are used for data cleansing and as well to size the model <next>Using the statistics we build a proprietary array based structure, which provides a illusion of a fully materialized cube. <next> The structure is distributed across the cluster.Then a MDX engine takes a multi-dimensional query, break them into tuples and calculates them in parallel across the cluster. One of the key aspects of the JovianDATA system, is that the load involves complex workflows and we have a framework to manage these workflows very efficiently.The results are apparent. In one of the test ran by a customer, 10 users ran 450 reports on a 2.5 TB warehouse for 6 hours. 90% of the reports returned in less than 10 secs. 40% of the reports in less than 100 ms. The longest a report took was 113 secs.Now Anupam will go into the details of the system. <next>
CLICK 1 :- Lets contrast Data Processing on the cloud with the conventional enterprise data center. A user comes in to run analytics.CLICK 2 :- One of the biggest misconception about the cloud is that all you need to do is to let the analyst connect to a datacetner in the cloud. All you need to do is to create an AMI with your favorite software (MySQL, Hadoop, Oracle etc) and bring up hundreds of instances in the cloud. This is like saying EC2 is just about ec2-run-instances. At JovianDATA, we have created an analytics engine that has revisited three key expensive operations in classical stacks. Each operation has been re-written to exploit the cloud.CLICK 3 :- Instead of keeping large clusters to run expensive grouping, we believe in bringing up an extraordinary amount of nodes for a short time. CLICK 4 :- This pre-calculation allows us to Reduce Disk I/O requirements at run time.CLICK 5 :- Most clouds do not excel in inter-processor communication. A big reason for network I/O is the notion of joining tables in databases. CLICK 6 :- We eliminate joining of large tables for the cube structure by using a patent-pending partitioning technique for multi-dimensional data.CLICK 7 :- To allow many users to load the same reports again and again, many of our digital media customers use materialized views which require DBA intervention. CLICK 8 :- Instead, we materialize views based on the customer’s usage without a DBA. This allows often-used data to be available to multitude of users.
JovianDATA works with customers which generate 10s of terabytes of data. In 2 years, we have identified 3 major problems that compel enterprise customers to move their analytics on the cloud.Capital expenditure. To get a BI project on 10 terabytes requires nearly an year of planning with an upfront six figure investment just to load and process the data. In this presentation, we will show how you don’t have to sign up for a six figure sum to try out 10 TB analytics.Over Provisioning. Anybody who has worked with a real world analytics stack knows that half the cluster lies underutilized nearly all the time. To add insult to injury, there are periods when the entire cluster lies unused some 20-30% of time like weekends and nights. All these nodes were allocated because of some peak usage on Monday morning. We will show you how we avoid provisioning for peak. Instead, we provision based on usage.Application Isolation. Even after assigning 100s of nodes to a task, applications keep running into each other in a real life deployment. The cloud provides a great opportunity to provision applications in their own sandboxes.
CLICK 1 :- Lets look at a classical analytics stack for 15 TB of data … A monolithic stack deployed in the cloud seldom exploits the cloud. In most cases, all it can take care of is expansion. CLICK 2 :- If we keep 15TB up on 100 nodes, it might cost upto$28,800.Is it cheaper than maintaining your own datacenter? Absolutely? Does it really use the cloud’s capability to use-as-you-go? Absolutely not.Lets look at an architecture that was built for the cloud rather than just retro-fitting of an old architecture.
In JovianDATA, nodes are allocated based on the need for a particular stage of data processing. CLICK 1 :- Here we shown an example flow where data becomes available sometime at night at the DoubleClick ftp server. CLICK 2 :- As the moon rises (so to speak), the Data Cleansing stage starts off and completes the model building in the hours of the night. CLICK 3 :- The generated model is then hibernated to S3 or EBS. The data cleansing and load model building clusters are terminated.CLICK 4 :- As the sun rises (so to speak), the query cluster is allocated and the model is restored. Our experience here is that even though there are no rampant node failures on EC2, we see failures during transportation of data from one cluster to another. For that, we have invented a patent pending technology which tracks data transportation minutely to make sure data does not disappear while moving from one stage to another.CLICK 5:- The main message here is that we never create a ‘full’ cluster. Instead, we employ role based clusters. This has a dramatic effect on cost. We believe that enterprise software stacks need to move to role based cluster if they want to get 10x savings. Otherwise, they still have the cap ex of allocating hundreds of nodes in the cloud.
Almost everybody talks about Cloud Computing enabling on demand performance. But, how easy is it to dial up performance based on usage? Is it just about simply adding more nodes to the cluster? How should the data be redistributed? Should it be done blindly?CLICK 1 :- At JovianDATA, we believe in selective replication to enable performance. Lets see selective replication in action? CLICK 2 :- Lets say a customer is running a Site Section report. If the report takes 10 minutes, we will dynamically provision 2 new nodes. But, it does not make sense to just copy data over to those nodes. CLICK 3 :- Instead, we create copies of the hottest partitions on these new nodes. The hot-ness of data is defined by a statistics package that provides complete visibility of the usage of the cluster.CLICK 4 : - With selective replication of hot data, the report will get generated in 30 seconds rather than 10 minutes. CLICK 5 :- Statistics and automatic algorithms are improtant because we have to do this selective replication on terabytes of data
Once the usage goes down, the nodes will be returned back through terminate-instance and the replicas will vanish. Thus, we have increased performance by nearly 20 times without increasing the cost of the analytics stack.They key message here is that blind addition of nodes is not dynamic provisioning. Dynamic Provisioning should work in hand in hand with intelligent replication of data.
The next big use case is application isolation. CLICK 1 :- Consider the use case of a Campaign Manager who needs to run a segmentation report which requires a heavy usage of the cluster. The non-cloud option for a customer is to keep the application running 24x7. This is because you never know when the Campaign Manager needs to run this intense report. The other options is to ‘share’ the application across operational reporting and intense analytics. A 24x7 cluster is characterized by high cost. A shared cluster is characterized by maintenance headaches when applications keep running into each other.CLICK 2 :- Keeping an application running in a large, shared cluster has a dramatic effect on the cost. A 50 node application could cost as much $14,400 per monthJovianDATA brings a third option to the table which exploits the cloud economics of cheap storage and elastic provisioning of CPUs.
In the scenario which we have often seen with our customers, the analyst comes in, lets say, once every fortnight to do deep analytics. CLICK 1 :- With JovianDATA, the analyst puts in a request for a cluster. The cluster gets allocated on demand. CLICK 2 :- The application Is then brought to life with a parallel restore from S3. With our parallel restore, we have seen as low as 30 minutes for 5-10TB cluster. CLICK 3 :- At JovianDATA, we do these restores by running parallel jobs which transfer data from S3 into EC2. The restore takes care of network failures as well software issues. Once the application is fully restored, the analyst can run their intense analytics in complete isolation from other analysts. Once their analysis is done, they can de-provision the cluster.CLICK 4 :- The cost savings of running on-demand clusters for deep analytics are dramatic. A 24/7 cluster would mean thousands of dollars in recurring expenditure. An on-demand cluster would require 100s of dollars and only when the analyst really uses the cluster.
In conclusion, at JovianDATA, we believe that 3 key things are necessary for a ‘real’ cloud computing solution :-EC2 should be exploited to have near 0 capex. If the software is deployed in the classic enterprise cap-ex intensive model, then you are leaving 10x of Cost Savings on the table.EC2 abilities to bring computing in minutes should be exploited to increase performance on demand. If your analytics takes days to improve performance and requires intense DBA intervention, then its not exploiting the cloud. 10x performance should be available in minutes.Using S3 for full application hibernation allows EC2 nodes to be de-provisioned. EC2 nodes should be provisioned only when a customer needs to run intense analytics on a periodic basis. Idle nodes are the biggest no-no and you would be leaving 100x in cost savings on the table if you are not using S3 effectively.

AWS Customer Presentation - JovianDATA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS Customer Presentation - JovianDATA

Similar to AWS Customer Presentation - JovianDATA (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS Customer Presentation - JovianDATA

Editor's Notes