SlideShare a Scribd company logo
1 of 33
Download to read offline
Presto and ScyllaDB
Tzach Livyatan Eyal Gutkind
ScyllaDB ScyllaDB
Agenda
• What is Presto?
• Why Presto?
• Scylla + Presto
▸ Connector
▸ Examples
• What’s Next?
What is Presto?
• Distributed ANSI SQL Query Engine for Big Data
• Developed by Facebook in 2012
• Data sources: HDFS, S3, Cassandra, MySQL, Kafka,
PostgreSQL, Redis and Scylla
• Open Source
• Java Base
Why Presto?
Presto
Interactive Data
Exploration
Source http://techblog.netflix.com/2014/10/usi ng-presto-in-our-big-data-platform.html
Spark
Real Time Analytics
Machine Learning
Iterative
Why Presto?
• ANSI SQL
• Extensible - multiple data
sources
• Fast (compare to hive)
• Custom engine designed to
support SQL semantics
Source http://techblog.netflix.com/2014/10/usi ng-presto-in-our-big-data-platform.html
Who use Presto ?
Presto Architecture
Source https://en.wikipedia.org/wiki/Presto_(SQL_query_engine)
Presto Cassandra Connector
• DataStream API:
▪ CQL
• DataLocation API
▪ Thrift: describe_ring and describe_splits_ex verbs
If number of partitions > 200
• Metadata API
▪ CQL: get table layout
Presto Cassandra Connector -
Configuration
• cassandra.contact-points
• cassandra.consistency-level
• cassandra.username / cassandra.password
• limit-for-partition-key-select
• cassandra.fetch-size
• cassandra.load-policy.use-dc-aware (default
false), dc-aware.local-dc
All: https://prestodb.io/docs/current/connector/cassandra.html
Scylla Example - CQL
CREATE KEYSPACE mykeyspace WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use mykeyspace ;
CREATE TABLE users
(user_id int PRIMARY KEY, fname text, lname text);
insert into users (user_id , fname, lname) values (1, 'tzach', 'livyatan');
insert into users (user_id , fname, lname) values (2, 'dor', 'laor');
insert into users (user_id , fname, lname) values (3, 'shlomi', 'laor');
insert into users (user_id , fname, lname) values (4, 'shlomi', 'livne');
insert into users (user_id , fname, lname) values (6, 'avi', 'kivity');
Scylla Example - Presto SQL
SELECT * FROM cassandra.mykeyspace.users where user_id >=
2 and user_id <= 3;
user_id | fname | lname
---------+--------+-------
2 | dor | laor
3 | shlomi | laor
(2 rows)
Scylla / Presto Example
CREATE TABLE air_quality_data (
... sensor_id text,
... time timestamp,
... co_ppm int,
... PRIMARY KEY (sensor_id, time)
... );
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home',
'2016-08-30 07:01:00', 17);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home',
'2016-08-30 07:01:01', 18);
Scylla / Presto Example
> select histogram(co_ppm) as hist from
cassandra.mykeyspace.air_quality_data where sensor_id='my_home';
hist
--------------------------------
{17=1, 18=1, 19=1, 20=2, 31=1}
select sensor_id, avg(co_ppm) as AVG from
cassandra.mykeyspace.air_quality_data group by sensor_id;
sensor_id | avg
-----------+--------------------
your_home | 629.2857142857143
my_home | 20.833333333333332
Scylla / Presto Example - Join
cqlsh:mykeyspace> CREATE TABLE address (user_id int PRIMARY KEY, number
int, street text, city text);
cqlsh:mykeyspace> insert into address (user_id , number, street) values (1,
'dizingof', 'tel aviv');
presto:default> select * from cassandra.mykeyspace.users us JOIN
cassandra.mykeyspace.address ad ON us.user_id = ad.user_id;
user_id | fname | lname | user_id | city | number | street
---------+-------+----------+---------+----------+--------+----------
1 | tzach | livyatan | 1 | tel aviv | 100 | dizingof
Try for yourself!
docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image
docker exec -it some-scylla-presto cqlsh
cqlsh>
docker exec -it some-scylla-presto ./presto --server localhost:8080 --
catalog cassandra --schema default
presto:default>
Presto SQL - Aggregate
Functions examples
• checksum - an order-insensitive checksum of the given values.
• count - number of input rows.
• count_if - number of TRUE input values
• max_by(x, y) - value of x associated with the maximum value of y over all
input values.
• histogram(x) - a map containing the count of the number of times each input
value occurs.
• approx_distinct(x) - approximate number of distinct input values. This
function provides an approximation of count(DISTINCT x). Zero is returned if
all input values are null.
• corr(y, x) - correlation coefficient of input values.
• stddev_samp(x) - sample standard deviation of all input values.
Airpal
Airbnb A web-based query execution tool built on top
of Facebook's PrestoDB.
Source https://blogs.aws.amazon.com/bigdata/post/Tx1BF2DN6KRFI27/Analyze-Data-with-Presto-and-Airpal-on-Amazon-EMR
Whats Next?
• Try it yourself https://hub.docker.com/r/tzachl/scylla-and-
presto-image
• Performance testing
• Future Optimizations
▪ Push query lower
▪ Pull data faster
Agenda
• What is Spark?
• Why Spark?
• Scylla + Spark
• What’s Next?
What is Spark?
What if I don’t Know?
Find me at the happy hour tonight, we have beer and wine!
Why Spark and Scylla?
• Faster Analytics with In-memory execution
• Close to Real-Time analytics on transactional data
• Iterative
• Resource efficiency for multiple workloads
Spark Architecture
Source: http://spark.apache.or g/docs/latest/cluster-overview.html
Spark & Spark RDDs @ ScyllaDB
(Resilient Distributed Datasets)
• Understand your data model
CREATE TABLE sensordata (s1data text,s2data text,timestmp timestamp,sregion
text,stype text, PRIMARY KEY (s1data, s2data,timestmp));
CREATE TABLE sensordata (s1data text,s2data text,timestmp timestamp,sregion
text,stype text, PRIMARY KEY (timestmp, s1data, s2data));
Data Model Impact on Data Load
cassandra-stress user profile=myexample.yaml no-warmup ops(insert=1) n=1000000 -rate threads=1000 -node $node
PRIMARY KEY (s1data, s2data,timestmp)); PRIMARY KEY (timestmp, s1data, s2data));
Try Maintain Data Locality - Colo.
Data Locality - Dedicated Clusters
Review your settings for:
Spark.locality.wait.*
(node, process, rack)
→ Default is 3s
Network Speed and Latency
Spark & ScyllaDB, CPU Settings
@Scylla
@Spark
--cpu-set
SPARK_WORKER_CORES
--smp
• Divide system cores based on expected workload
Spark & Scylla, Memory Settings
(Resilient Distributed Datasets)
@Spark
SPARK_WORKER_MEMORY , per worker node
spark.executor.memory , When you set your specific application
memory consumption
@Scylla
Will just take whatever you give us :)
Spark Building and C* Connector
• Environment used in above examples:
▪ AWS i2.8xlarge x3 for Scylla & Spark (colo)
- Scylla 1.3, Spark 1.6.2
- Mvn 3.3.9
- Building Spark standalone cluster on EC2
▪ Each server has 32 cores
- 24 cores → Scylla
- 8 cores → Spark
▪ Each server has 244GB RAM
- 64GB for each Spark worker
- 128GB for Scylla
• Use Apache Cassandra connector Ver. 1.6
• Using Spark Standalone Cluster deployment
• Think Resources - CPU and Memory
• Better modeling, easier deployment, faster analytics
• Data locality can be a blessing if managed correctly
▪ Scylla’s optimal sharding enable data ingestion without
compromising analytics performance
Whats Next?
• Try it yourself
▪ Here is how to do it:
http://www.scylladb.com/kb/scylla-and-spark-
integration/
• Performance testing and use cases
Tell us about your experience with Scylla and Spark
Thank You!
Try Presto with Scylla now:
https://hub.docker.com/r/tzachl/scylla-
and-presto-image/
Contact: tzach@scylladb.com

More Related Content

What's hot

Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesScyllaDB
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020ScyllaDB
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverAvi Kivity
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsScyllaDB
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?Webinar: Does it Still Make Sense to do Big Data with Small Nodes?
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?Julia Angell
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA
 
Developing Scylla Applications: Practical Tips
Developing Scylla Applications: Practical TipsDeveloping Scylla Applications: Practical Tips
Developing Scylla Applications: Practical TipsScyllaDB
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlScyllaDB
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScyllaDB
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 

What's hot (20)

Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per server
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?Webinar: Does it Still Make Sense to do Big Data with Small Nodes?
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
Developing Scylla Applications: Practical Tips
Developing Scylla Applications: Practical TipsDeveloping Scylla Applications: Practical Tips
Developing Scylla Applications: Practical Tips
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and Scylla
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 

Viewers also liked

Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScyllaDB
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Scylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes NativeScylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes NativeScyllaDB
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScyllaDB
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Matt Fuller
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentationArulalan T
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Wojciech Biela
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSChris Riddell
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 

Viewers also liked (20)

Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Scylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes NativeScylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes Native
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentation
 
Presto overview
Presto overviewPresto overview
Presto overview
 
Presto
PrestoPresto
Presto
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 

Similar to Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 

Similar to Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla (20)

Presentation
PresentationPresentation
Presentation
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

  • 1. Presto and ScyllaDB Tzach Livyatan Eyal Gutkind ScyllaDB ScyllaDB
  • 2. Agenda • What is Presto? • Why Presto? • Scylla + Presto ▸ Connector ▸ Examples • What’s Next?
  • 3. What is Presto? • Distributed ANSI SQL Query Engine for Big Data • Developed by Facebook in 2012 • Data sources: HDFS, S3, Cassandra, MySQL, Kafka, PostgreSQL, Redis and Scylla • Open Source • Java Base
  • 4. Why Presto? Presto Interactive Data Exploration Source http://techblog.netflix.com/2014/10/usi ng-presto-in-our-big-data-platform.html Spark Real Time Analytics Machine Learning Iterative
  • 5. Why Presto? • ANSI SQL • Extensible - multiple data sources • Fast (compare to hive) • Custom engine designed to support SQL semantics Source http://techblog.netflix.com/2014/10/usi ng-presto-in-our-big-data-platform.html
  • 8. Presto Cassandra Connector • DataStream API: ▪ CQL • DataLocation API ▪ Thrift: describe_ring and describe_splits_ex verbs If number of partitions > 200 • Metadata API ▪ CQL: get table layout
  • 9. Presto Cassandra Connector - Configuration • cassandra.contact-points • cassandra.consistency-level • cassandra.username / cassandra.password • limit-for-partition-key-select • cassandra.fetch-size • cassandra.load-policy.use-dc-aware (default false), dc-aware.local-dc All: https://prestodb.io/docs/current/connector/cassandra.html
  • 10. Scylla Example - CQL CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; use mykeyspace ; CREATE TABLE users (user_id int PRIMARY KEY, fname text, lname text); insert into users (user_id , fname, lname) values (1, 'tzach', 'livyatan'); insert into users (user_id , fname, lname) values (2, 'dor', 'laor'); insert into users (user_id , fname, lname) values (3, 'shlomi', 'laor'); insert into users (user_id , fname, lname) values (4, 'shlomi', 'livne'); insert into users (user_id , fname, lname) values (6, 'avi', 'kivity');
  • 11. Scylla Example - Presto SQL SELECT * FROM cassandra.mykeyspace.users where user_id >= 2 and user_id <= 3; user_id | fname | lname ---------+--------+------- 2 | dor | laor 3 | shlomi | laor (2 rows)
  • 12. Scylla / Presto Example CREATE TABLE air_quality_data ( ... sensor_id text, ... time timestamp, ... co_ppm int, ... PRIMARY KEY (sensor_id, time) ... ); INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:00', 17); INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:01', 18);
  • 13. Scylla / Presto Example > select histogram(co_ppm) as hist from cassandra.mykeyspace.air_quality_data where sensor_id='my_home'; hist -------------------------------- {17=1, 18=1, 19=1, 20=2, 31=1} select sensor_id, avg(co_ppm) as AVG from cassandra.mykeyspace.air_quality_data group by sensor_id; sensor_id | avg -----------+-------------------- your_home | 629.2857142857143 my_home | 20.833333333333332
  • 14. Scylla / Presto Example - Join cqlsh:mykeyspace> CREATE TABLE address (user_id int PRIMARY KEY, number int, street text, city text); cqlsh:mykeyspace> insert into address (user_id , number, street) values (1, 'dizingof', 'tel aviv'); presto:default> select * from cassandra.mykeyspace.users us JOIN cassandra.mykeyspace.address ad ON us.user_id = ad.user_id; user_id | fname | lname | user_id | city | number | street ---------+-------+----------+---------+----------+--------+---------- 1 | tzach | livyatan | 1 | tel aviv | 100 | dizingof
  • 15. Try for yourself! docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image docker exec -it some-scylla-presto cqlsh cqlsh> docker exec -it some-scylla-presto ./presto --server localhost:8080 -- catalog cassandra --schema default presto:default>
  • 16. Presto SQL - Aggregate Functions examples • checksum - an order-insensitive checksum of the given values. • count - number of input rows. • count_if - number of TRUE input values • max_by(x, y) - value of x associated with the maximum value of y over all input values. • histogram(x) - a map containing the count of the number of times each input value occurs. • approx_distinct(x) - approximate number of distinct input values. This function provides an approximation of count(DISTINCT x). Zero is returned if all input values are null. • corr(y, x) - correlation coefficient of input values. • stddev_samp(x) - sample standard deviation of all input values.
  • 17. Airpal Airbnb A web-based query execution tool built on top of Facebook's PrestoDB. Source https://blogs.aws.amazon.com/bigdata/post/Tx1BF2DN6KRFI27/Analyze-Data-with-Presto-and-Airpal-on-Amazon-EMR
  • 18. Whats Next? • Try it yourself https://hub.docker.com/r/tzachl/scylla-and- presto-image • Performance testing • Future Optimizations ▪ Push query lower ▪ Pull data faster
  • 19. Agenda • What is Spark? • Why Spark? • Scylla + Spark • What’s Next?
  • 21. What if I don’t Know? Find me at the happy hour tonight, we have beer and wine!
  • 22. Why Spark and Scylla? • Faster Analytics with In-memory execution • Close to Real-Time analytics on transactional data • Iterative • Resource efficiency for multiple workloads
  • 23. Spark Architecture Source: http://spark.apache.or g/docs/latest/cluster-overview.html
  • 24. Spark & Spark RDDs @ ScyllaDB (Resilient Distributed Datasets) • Understand your data model CREATE TABLE sensordata (s1data text,s2data text,timestmp timestamp,sregion text,stype text, PRIMARY KEY (s1data, s2data,timestmp)); CREATE TABLE sensordata (s1data text,s2data text,timestmp timestamp,sregion text,stype text, PRIMARY KEY (timestmp, s1data, s2data));
  • 25. Data Model Impact on Data Load cassandra-stress user profile=myexample.yaml no-warmup ops(insert=1) n=1000000 -rate threads=1000 -node $node PRIMARY KEY (s1data, s2data,timestmp)); PRIMARY KEY (timestmp, s1data, s2data));
  • 26. Try Maintain Data Locality - Colo.
  • 27. Data Locality - Dedicated Clusters Review your settings for: Spark.locality.wait.* (node, process, rack) → Default is 3s Network Speed and Latency
  • 28. Spark & ScyllaDB, CPU Settings @Scylla @Spark --cpu-set SPARK_WORKER_CORES --smp • Divide system cores based on expected workload
  • 29. Spark & Scylla, Memory Settings (Resilient Distributed Datasets) @Spark SPARK_WORKER_MEMORY , per worker node spark.executor.memory , When you set your specific application memory consumption @Scylla Will just take whatever you give us :)
  • 30. Spark Building and C* Connector • Environment used in above examples: ▪ AWS i2.8xlarge x3 for Scylla & Spark (colo) - Scylla 1.3, Spark 1.6.2 - Mvn 3.3.9 - Building Spark standalone cluster on EC2 ▪ Each server has 32 cores - 24 cores → Scylla - 8 cores → Spark ▪ Each server has 244GB RAM - 64GB for each Spark worker - 128GB for Scylla • Use Apache Cassandra connector Ver. 1.6
  • 31. • Using Spark Standalone Cluster deployment • Think Resources - CPU and Memory • Better modeling, easier deployment, faster analytics • Data locality can be a blessing if managed correctly ▪ Scylla’s optimal sharding enable data ingestion without compromising analytics performance
  • 32. Whats Next? • Try it yourself ▪ Here is how to do it: http://www.scylladb.com/kb/scylla-and-spark- integration/ • Performance testing and use cases Tell us about your experience with Scylla and Spark
  • 33. Thank You! Try Presto with Scylla now: https://hub.docker.com/r/tzachl/scylla- and-presto-image/ Contact: tzach@scylladb.com