SlideShare a Scribd company logo
LONDON USERGROUP
APRIL 30th
2009
Nicola Cardace
Topics
• Auto-Scaling Using Amazon EC2 and Scalr
• Nginx and Memcached on EC2, a 400% boost!
• NASDAQ exchange re-play on AWS
• Persistent Django on Amazon EC2 and EBS
• Taking Massive Distributed Computing to the
Common Man - Hadoop on Amazon EC2/S3
Auto-Scaling Using
Amazon EC2 and Scalr
Scalr, a redundant, self-curing, self-scaling hosting
solution built on EC2
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1603
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1357&categ
tinyurl.com/4lkr3n
tinyurl.com/c7num9
Scalr sourcecode:
http://scalr.googlecode.com/svn/trunk/
***
Scalr overview
• By using Scalr, you can create a server farm that uses prebuilt AMIs
for load balancing, web servers, and databases. You also can
customize a generic AMI, which you can use to host your actual
application.
• Scalr monitors the health of the entire server farm, ensuring that
instances stay running and that load averages stay below a
configurable threshold. If an instance crashes, another one of the
proper type will be launched and added to the load balancer.
Scalr (2)
• Scalr is an open source, fully redundant, self-curing, and
self-scaling hosting environment that uses Amazon EC2.
• Scalr allows network administrators to create virtual
server farms, using prebuilt components. Scalr uses four
Amazon Machine Instances (AMIs) for load balancing,
databases, application server, and a generic base
image.
• Administrators can preconfigure one machine and, when
the load warrants, bring online additional machines with
the same image, to handle the increased requests.
Nginx and Memcached on EC2
400% boost!
Nginx and Memcached on EC2
400% boost!
(with a five minutes config tweak!)
Originally developed by Igor Sysoev for rambler.ru (second largest
Russian web-site), it is a high-performance HTTP server / reverse
proxy known for its stability, performance, and ease of use. The great
track record, a lot of great modules, and an active development
community have rightfully earned it a steady uptick of users
memcached is a high-performance, distributed memory object
caching system, generic in nature, but intended for use in
speeding up dynamic web applications by alleviating database
load.
“Memcached, the darling of every web-developer, is
capable of turning almost any application into a speed-
demon. Benchmarking one of my own Rails applications
resulted in ~850 req/s on commodity, non-optimized
hardware - more than enough in the case of this
application. However, what if we took Mongrel out of the
equation? Nginx, by default, comes prepackaged with the
Memcached module, which allows us to bypass the
Mongrel (from rubyforge) servers and talk to Memcached
directly. Same hardware, and a quick test later: ~3,550
req/s, or almost a 400% improvement!”
Nginx and Memcached on EC2
400% boost!
http://tinyurl.com/3a7t9y
***
NASDAQ exchange
re-play on AWS
your homework 
Persistent Django on Amazon
EC2 and EBS
Credit:
Thomas Brox Røst,
Visiting researcher, Decision Systems Group, Harvard
Persistent Django
on Amazon EC2 and EBS - The easy way
thomas.broxrost.com
tinyurl.com/6b48g9
Now that Amazon’s Elastic Block Store (EBS) is publicly available,
running a complete Django installation on Amazon Web Services
(AWS) is easier than ever.
---
EBS provides persistent storage, which means that the Django database
is kept safe even after the Django EC2 instances terminate
To setup Django with persistent PostgreSQL database on AWS:
Set up an AWS account
Download and install the Elasticfox Firefox extension
Add your AWS credentials to Firefox
Create a new EC2 security group
By default, EC2 instances are an introverted lot: They prefer keeping to themselves and don’t expose any
of their ports to the outside world. We will be running a web application on port 8000 so therefore port
8000 has to be opened. (Normally we would be opening port 80, but since I will only be using the Django
development web server then port 8000 is preferable). SSH access is also essential, so port 22 should be
opened as well. To make this happen we must create a new security group where these ports are opened.
Set up a key pair
Launch an EC2 Instance
Connect with your new instance (ssh using putty)
- Install subversion
- Install, initialize and launch PostgreSQL
- Modify PostgreSQL config to avoid username/password problems
- Restart PostgreSQL to enable new security policy
- Set up a database for Django
- Install Django (checkout from SVN)
- Install psycopg2 (for database access from Python)
Set up a Django project
Test the installation
Launch the dev server
Create a Django app
Create and mount an EBS Instance
Mount the filesystem
Move the database to persistent storage (with server stopped)
***
AWS
Elastic MapReduce
Amazon Elastic MapReduce
Data and Computing Trends:
Source: Facebook
• Explosion of Data
– Web Logs, Ad-Server logs, Sensor Networks, Seismic Data, DNA
sequences (?)
– User generated content/Web 2.0
– Data as BI => Data as product (Search, Ads, Digg, Quantcast, …)
• Declining Revenue/GB
– Milk @ $3/gallon => $15M / GB
– Ads @ 20c / 10^6 impressions => $1/GB
– Google Analytics, Facebook Lexicon == Free!
• Hardware Trends
– Commodity Rocks: $4K 1U box = 8 cores + 16GB mem + 4x1TB
– CPU: SMP  NUMA, Storage: $ Shared-Nothing << $ Shared,
Networking: Ethernet
Hadoop
Hadoop
• Parallel Computing platform
– Distributed FileSystem (HDFS)
– Parallel Processing model (Map/Reduce)
– Express Computation in any language
– Job execution for Map/Reduce jobs
(scheduling+localization+retries/speculation)
• Open-Source
– Most popular Apache project!
– Highly Extensible Java Stack (@ expense of Efficiency)
– Develop/Test on EC2!
• Ride the commodity curve:
– Cheap (but reliable) shared nothing storage
– Data Local computing (don’t need high speed networks)
– Highly Scalable (@expense of Efficiency)
Hadoop
Map/Reduce DataFLow
Hadoop Running MapReduce
In Pictures (Source: Facebook)
Looks like this ..
Disks
Node
Disks
Node
Disks
Node
Disks
Node
Disks
Node
Disks
Node
1 Gigabit 4-8 Gigabit
Node
=
DataNode
+
Map-Reduce
Why HIVE?
• Large installed base of SQL users 
– ie. map-reduce is for ultra-geeks
– much much easier to write sql query
• Analytics SQL queries translate really well
to map-reduce
• Files as insufficient data management
abstraction
– Tables, Schemas, Partitions, Indices
Hive Query Language
• Basic SQL
– From clause subquery
– ANSI JOIN (equi-join only)
– Multi-table Insert
– Multi group-by
– Sampling
– Objects traversal
• Extensibility
– Pluggable Map-reduce scripts using
TRANSFORM
Data Warehousing at Facebook
(Scribe is a server for aggregating log data streamed in real time from a large
number of servers. It is designed to be scalable, extensible without client-side
modification, and robust to failure of the network or any specific machine)
Web Servers Scribe Servers
Filers
Hive on
Hadoop Cluster
Oracle RAC Federated MySQL
Hadoop Usage @ Facebook
• Data warehouse running Hive
• 600 machines, 4800 cores
• 3200 jobs per day
• 50+ engineers have used Hadoop
• Data statistics:
– Total Data: ~2.5PB
– Net Data added/day: ~15TB
• 6TB of uncompressed source logs
• 4TB of uncompressed dimension data reloaded daily
– Compression Factor ~5x (gzip, more with bzip)
• Usage statistics:
– 3200 jobs/day with 800K tasks(map-reduce tasks)/day
– 55TB of compressed data scanned daily
– 15TB of compressed output data written to hdfs
– 80 MM compute minutes/day
Hadoop Job types @ Facebook
• Production jobs: load data, compute
statistics, detect spam, etc
• Long experiments: machine learning, etc
• Small ad-hoc queries: Hive jobs, sampling
• GOAL: Provide fast response times for
small jobs and guaranteed service levels
for production jobs
Usage patterns in Yahoo
• ETL
– Put large data source (eg. Log files) onto the Hadoop File System
– Perform aggregations, transformations, normalizations on the data
– Load into RDBMS / data mart
• Reporting and Analytics
– Run canned and ad-hoc queries over large data
– Run analytics and data mining operations on large data
– Produce reports for end-user consumption or loading into data mart
Usage patterns in Yahoo
• Data Processing Pipelines
– Multi-step pipelines for data processing
– Coordination, scheduling, data collection and publishing of feeds
– SLA carrying, regularly scheduled jobs
• Machine Learning & Graph Algorithms
– Traverse large graphs and data sets, building models and classifiers
– Implement machine learning algorithms over massive data sets
• General Back end processing
– Implement significant portions of back-end, batch oriented processing on the grid
– General computation framework
– Simplify back-end architecture
What is Hadoop Pig
Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.
http://www.cloudera.com/hadoop-training-pig-introduction
Thanks to the kind sponsorship
to the AWS LONDON USER
GROUP
from
LONDON USERGROUP
Thank you !
@n1c0la

More Related Content

What's hot

Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
Spark Summit
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
Sigmoid
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
Jason Hubbard
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
Ilya Ganelin
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
DataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
Omid Vahdaty
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 

What's hot (20)

Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 

Viewers also liked

A eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águasA eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águas
mbenquerenca
 
Tribal Moose Power Point V2
Tribal Moose Power Point V2Tribal Moose Power Point V2
Tribal Moose Power Point V2
Word's Out PR
 
Cahier Planète Pme 2010
Cahier Planète Pme 2010Cahier Planète Pme 2010
Cahier Planète Pme 2010
Renaud Favier
 
Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Kosmopolitan
 
How To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePointHow To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePoint
Nick Inglis
 
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
Chris Purrington
 

Viewers also liked (6)

A eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águasA eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águas
 
Tribal Moose Power Point V2
Tribal Moose Power Point V2Tribal Moose Power Point V2
Tribal Moose Power Point V2
 
Cahier Planète Pme 2010
Cahier Planète Pme 2010Cahier Planète Pme 2010
Cahier Planète Pme 2010
 
Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!
 
How To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePointHow To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePoint
 
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
 

Similar to AWS (Hadoop) Meetup 30.04.09

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
Jonathan Holloway
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
Chris Bunch
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
Marcel Krcah
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
Cloud computing & lamp applications
Cloud computing & lamp applicationsCloud computing & lamp applications
Cloud computing & lamp applications
Corley S.r.l.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
Paolo latella
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
Yousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 

Similar to AWS (Hadoop) Meetup 30.04.09 (20)

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Cloud computing & lamp applications
Cloud computing & lamp applicationsCloud computing & lamp applications
Cloud computing & lamp applications
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 

More from Chris Purrington

PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
Chris Purrington
 
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon -   AI Ethics - Bias in DataLucy Craddock CloudCampLondon -   AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Chris Purrington
 
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Chris Purrington
 
Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019
Chris Purrington
 
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
Chris Purrington
 
CloudCamp. Rhys Sharp Applications & PaaS
CloudCamp. Rhys Sharp   Applications & PaaSCloudCamp. Rhys Sharp   Applications & PaaS
CloudCamp. Rhys Sharp Applications & PaaS
Chris Purrington
 
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
Chris Purrington
 
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
Chris Purrington
 
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
Chris Purrington
 
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
Chris Purrington
 
CloudCamp. Danile Power - It's All About Managing the App
CloudCamp. Danile Power -  It's All About Managing the AppCloudCamp. Danile Power -  It's All About Managing the App
CloudCamp. Danile Power - It's All About Managing the AppChris Purrington
 
CloudCamp justin cormack hypervise my app!
CloudCamp   justin cormack    hypervise my app! CloudCamp   justin cormack    hypervise my app!
CloudCamp justin cormack hypervise my app! Chris Purrington
 
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
Steve chambers   cloud psychopaths- cloud camplondon 24.10.12Steve chambers   cloud psychopaths- cloud camplondon 24.10.12
Steve chambers cloud psychopaths- cloud camplondon 24.10.12Chris Purrington
 
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12Chris Purrington
 
Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12Chris Purrington
 
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12Chris Purrington
 
Joe baguley cloudcamp london intro 24.10.12
Joe baguley   cloudcamp london intro 24.10.12Joe baguley   cloudcamp london intro 24.10.12
Joe baguley cloudcamp london intro 24.10.12Chris Purrington
 
5. shanley cloudcamplondon
5. shanley cloudcamplondon5. shanley cloudcamplondon
5. shanley cloudcamplondonChris Purrington
 
4. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 20124. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 2012Chris Purrington
 
1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcampChris Purrington
 

More from Chris Purrington (20)

PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
 
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon -   AI Ethics - Bias in DataLucy Craddock CloudCampLondon -   AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
 
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
 
Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019
 
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
 
CloudCamp. Rhys Sharp Applications & PaaS
CloudCamp. Rhys Sharp   Applications & PaaSCloudCamp. Rhys Sharp   Applications & PaaS
CloudCamp. Rhys Sharp Applications & PaaS
 
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
 
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
 
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
 
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
 
CloudCamp. Danile Power - It's All About Managing the App
CloudCamp. Danile Power -  It's All About Managing the AppCloudCamp. Danile Power -  It's All About Managing the App
CloudCamp. Danile Power - It's All About Managing the App
 
CloudCamp justin cormack hypervise my app!
CloudCamp   justin cormack    hypervise my app! CloudCamp   justin cormack    hypervise my app!
CloudCamp justin cormack hypervise my app!
 
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
Steve chambers   cloud psychopaths- cloud camplondon 24.10.12Steve chambers   cloud psychopaths- cloud camplondon 24.10.12
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
 
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
 
Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12
 
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
 
Joe baguley cloudcamp london intro 24.10.12
Joe baguley   cloudcamp london intro 24.10.12Joe baguley   cloudcamp london intro 24.10.12
Joe baguley cloudcamp london intro 24.10.12
 
5. shanley cloudcamplondon
5. shanley cloudcamplondon5. shanley cloudcamplondon
5. shanley cloudcamplondon
 
4. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 20124. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 2012
 
1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

AWS (Hadoop) Meetup 30.04.09

  • 2. Topics • Auto-Scaling Using Amazon EC2 and Scalr • Nginx and Memcached on EC2, a 400% boost! • NASDAQ exchange re-play on AWS • Persistent Django on Amazon EC2 and EBS • Taking Massive Distributed Computing to the Common Man - Hadoop on Amazon EC2/S3
  • 3.
  • 4.
  • 5. Auto-Scaling Using Amazon EC2 and Scalr Scalr, a redundant, self-curing, self-scaling hosting solution built on EC2
  • 6.
  • 7.
  • 9. Scalr overview • By using Scalr, you can create a server farm that uses prebuilt AMIs for load balancing, web servers, and databases. You also can customize a generic AMI, which you can use to host your actual application. • Scalr monitors the health of the entire server farm, ensuring that instances stay running and that load averages stay below a configurable threshold. If an instance crashes, another one of the proper type will be launched and added to the load balancer.
  • 10. Scalr (2) • Scalr is an open source, fully redundant, self-curing, and self-scaling hosting environment that uses Amazon EC2. • Scalr allows network administrators to create virtual server farms, using prebuilt components. Scalr uses four Amazon Machine Instances (AMIs) for load balancing, databases, application server, and a generic base image. • Administrators can preconfigure one machine and, when the load warrants, bring online additional machines with the same image, to handle the increased requests.
  • 11. Nginx and Memcached on EC2 400% boost!
  • 12. Nginx and Memcached on EC2 400% boost! (with a five minutes config tweak!)
  • 13. Originally developed by Igor Sysoev for rambler.ru (second largest Russian web-site), it is a high-performance HTTP server / reverse proxy known for its stability, performance, and ease of use. The great track record, a lot of great modules, and an active development community have rightfully earned it a steady uptick of users
  • 14. memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. “Memcached, the darling of every web-developer, is capable of turning almost any application into a speed- demon. Benchmarking one of my own Rails applications resulted in ~850 req/s on commodity, non-optimized hardware - more than enough in the case of this application. However, what if we took Mongrel out of the equation? Nginx, by default, comes prepackaged with the Memcached module, which allows us to bypass the Mongrel (from rubyforge) servers and talk to Memcached directly. Same hardware, and a quick test later: ~3,550 req/s, or almost a 400% improvement!”
  • 15.
  • 16. Nginx and Memcached on EC2 400% boost! http://tinyurl.com/3a7t9y ***
  • 17. NASDAQ exchange re-play on AWS your homework 
  • 18.
  • 19.
  • 20. Persistent Django on Amazon EC2 and EBS
  • 21.
  • 22. Credit: Thomas Brox Røst, Visiting researcher, Decision Systems Group, Harvard Persistent Django on Amazon EC2 and EBS - The easy way thomas.broxrost.com tinyurl.com/6b48g9
  • 23. Now that Amazon’s Elastic Block Store (EBS) is publicly available, running a complete Django installation on Amazon Web Services (AWS) is easier than ever. --- EBS provides persistent storage, which means that the Django database is kept safe even after the Django EC2 instances terminate
  • 24. To setup Django with persistent PostgreSQL database on AWS: Set up an AWS account Download and install the Elasticfox Firefox extension Add your AWS credentials to Firefox Create a new EC2 security group By default, EC2 instances are an introverted lot: They prefer keeping to themselves and don’t expose any of their ports to the outside world. We will be running a web application on port 8000 so therefore port 8000 has to be opened. (Normally we would be opening port 80, but since I will only be using the Django development web server then port 8000 is preferable). SSH access is also essential, so port 22 should be opened as well. To make this happen we must create a new security group where these ports are opened.
  • 25. Set up a key pair Launch an EC2 Instance Connect with your new instance (ssh using putty) - Install subversion - Install, initialize and launch PostgreSQL - Modify PostgreSQL config to avoid username/password problems - Restart PostgreSQL to enable new security policy - Set up a database for Django - Install Django (checkout from SVN) - Install psycopg2 (for database access from Python) Set up a Django project Test the installation Launch the dev server Create a Django app Create and mount an EBS Instance Mount the filesystem Move the database to persistent storage (with server stopped)
  • 26. ***
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Data and Computing Trends: Source: Facebook • Explosion of Data – Web Logs, Ad-Server logs, Sensor Networks, Seismic Data, DNA sequences (?) – User generated content/Web 2.0 – Data as BI => Data as product (Search, Ads, Digg, Quantcast, …) • Declining Revenue/GB – Milk @ $3/gallon => $15M / GB – Ads @ 20c / 10^6 impressions => $1/GB – Google Analytics, Facebook Lexicon == Free! • Hardware Trends – Commodity Rocks: $4K 1U box = 8 cores + 16GB mem + 4x1TB – CPU: SMP  NUMA, Storage: $ Shared-Nothing << $ Shared, Networking: Ethernet
  • 41. Hadoop • Parallel Computing platform – Distributed FileSystem (HDFS) – Parallel Processing model (Map/Reduce) – Express Computation in any language – Job execution for Map/Reduce jobs (scheduling+localization+retries/speculation) • Open-Source – Most popular Apache project! – Highly Extensible Java Stack (@ expense of Efficiency) – Develop/Test on EC2! • Ride the commodity curve: – Cheap (but reliable) shared nothing storage – Data Local computing (don’t need high speed networks) – Highly Scalable (@expense of Efficiency)
  • 45. In Pictures (Source: Facebook)
  • 46. Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
  • 47. Why HIVE? • Large installed base of SQL users  – ie. map-reduce is for ultra-geeks – much much easier to write sql query • Analytics SQL queries translate really well to map-reduce • Files as insufficient data management abstraction – Tables, Schemas, Partitions, Indices
  • 48.
  • 49. Hive Query Language • Basic SQL – From clause subquery – ANSI JOIN (equi-join only) – Multi-table Insert – Multi group-by – Sampling – Objects traversal • Extensibility – Pluggable Map-reduce scripts using TRANSFORM
  • 50. Data Warehousing at Facebook (Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine) Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
  • 51. Hadoop Usage @ Facebook • Data warehouse running Hive • 600 machines, 4800 cores • 3200 jobs per day • 50+ engineers have used Hadoop • Data statistics: – Total Data: ~2.5PB – Net Data added/day: ~15TB • 6TB of uncompressed source logs • 4TB of uncompressed dimension data reloaded daily – Compression Factor ~5x (gzip, more with bzip) • Usage statistics: – 3200 jobs/day with 800K tasks(map-reduce tasks)/day – 55TB of compressed data scanned daily – 15TB of compressed output data written to hdfs – 80 MM compute minutes/day
  • 52. Hadoop Job types @ Facebook • Production jobs: load data, compute statistics, detect spam, etc • Long experiments: machine learning, etc • Small ad-hoc queries: Hive jobs, sampling • GOAL: Provide fast response times for small jobs and guaranteed service levels for production jobs
  • 53. Usage patterns in Yahoo • ETL – Put large data source (eg. Log files) onto the Hadoop File System – Perform aggregations, transformations, normalizations on the data – Load into RDBMS / data mart • Reporting and Analytics – Run canned and ad-hoc queries over large data – Run analytics and data mining operations on large data – Produce reports for end-user consumption or loading into data mart
  • 54. Usage patterns in Yahoo • Data Processing Pipelines – Multi-step pipelines for data processing – Coordination, scheduling, data collection and publishing of feeds – SLA carrying, regularly scheduled jobs • Machine Learning & Graph Algorithms – Traverse large graphs and data sets, building models and classifiers – Implement machine learning algorithms over massive data sets • General Back end processing – Implement significant portions of back-end, batch oriented processing on the grid – General computation framework – Simplify back-end architecture
  • 55. What is Hadoop Pig Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. http://www.cloudera.com/hadoop-training-pig-introduction
  • 56.
  • 57.
  • 58. Thanks to the kind sponsorship to the AWS LONDON USER GROUP from

Editor's Notes

  1. 200bytes/transaction Milk – assuming each transaction is for 1Gallon Who needs another programming Language (PLSQL ) Gotchas later on (about networking trends) Anyone can rent a computer!!!! (UC Berkeley)
  2. UC Berkeley EC2 example
  3. UC Berkeley EC2 example
  4. Point out that now we know how HDFS works – we can run maps close to data
  5. Point out that now we know how HDFS works – we can run maps close to data
  6. Point out that now we know how HDFS works – we can run maps close to data
  7. Nomenclature: Core switch and Top of Rack
  8. Simple map-reduce is easy – but it can get complicated very quickly.
  9. Multi table inserts and multi group by’s allow us to reduce the number of scans required. Poor man’s alternative to MQO.