Big data

•Download as ODP, PDF•

1 like•964 views

Kevin Cawley

a quick talk i gave at the meetup in boulder, colorado

Technology Design

Cassandra – been actively using for 2+ years

Options: mongodb, redis, cassandra, couch, hbase, riak, voldermort, dynamodb

We're gonna get billions of responses – RDBMS is going to fall over

We need nosql... what the hell is that??

Problem For Today, cont. Kevin, response=cassandra, kevin@foo.com Emma, response=redis, emma@foo.com Asher, response=cassandra, [email_address] … … … BILLIONS AND BILLIONS OF THESE!!!!

Ring architecture w/ replication 2^217 tokens Node 1 Node 2 Node 4 Node 3

Column Families – std, dynamic (mo better) name preference 100 kevin cawley cassandra 101 asher cawley cassandra 102 emma cawley redis 202 201 redis ['joe','bob'] ['matthias'] cassandra ['kevin', 'asher'] ['tom'] mongodb ['holly'] ['dan'] assume User keys as utf8;

What's hot

The Automation FactoryNathan Milford

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax

SSTable Reader Cassandra Day Denver 2014Ben Vanberg

Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu

Spark application on ec2 clusterChao-Hsuan Shen

Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA

PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax

Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax

Learn Cassandra at edureka!Edureka!

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax

Scylla db deck, july 2017Dor Laor

Cloud Friendly Hadoop and HiveDataWorks Summit

Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services

Introduction to CassandraGokhan Atil

Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...Magalix Corporation

Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu

C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax

What's hot (20)

The Automation Factory

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...

SSTable Reader Cassandra Day Denver 2014

Cassandra CLuster Management by Japan Cassandra Community

Spark application on ec2 cluster

Introduction to the Hadoop Ecosystem (SEACON Edition)

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...

PySpark Cassandra - Amsterdam Spark Meetup

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...

Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016

Learn Cassandra at edureka!

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...

Scylla db deck, july 2017

Cloud Friendly Hadoop and Hive

Time to Science, Time to Results. Accelerating Scientific research in the Cloud

Introduction to Cassandra

Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...

Instaclustr webinar 2017 feb 08 japan

C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016

Similar to Big data

Spring one2gx2010 spring-nonrelational_dataRoger Xia

Cassandra and Rails at LA NoSQL MeetupMichael Wynholds

Stratio big data spainÁlvaro Agea Herradón

No sqlMurat Çakal

NoSql DatabaseSuresh Parmar

NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex

Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma

Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud

Design for a Distributed Name NodeAaron Cordova

Strategies for Distributed Data Storagekakugawa

(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services

Cassandra and Hybrid Cloud - Introducing MacheExcelian | Luxoft Financial Services

Cassandra+HadoopJeremy Hanna

Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu

Cassndra (4).pptxNikhilAmauriya

cassandraAkash R

Cassandra RedisDiego Pacheco

HDFS introductioninjae yeo

An efficient data mining solution by integrating Spark and CassandraStratio

Similar to Big data (20)

Spring one2gx2010 spring-nonrelational_data

Cassandra and Rails at LA NoSQL Meetup

Stratio big data spain

No sql

NoSql Database

NoSQL A brief look at Apache Cassandra Distributed Database

Qubole Overview at the Fifth Elephant Conference

Apache Cassandra @Geneva JUG 2013.02.26

Design for a Distributed Name Node

Strategies for Distributed Data Storage

(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...

Cassandra and Hybrid Cloud - Introducing Mache

Cassandra+Hadoop

Kafka spark cassandra webinar feb 16 2016

Cassndra (4).pptx

cassandra

Cassandra Redis

HDFS introduction

An efficient data mining solution by integrating Spark and Cassandra

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Story boards and shot lists for my a level piececharlottematthew16

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

"ML in Production",Oleksandr BaganFwdays

How to write a Business Continuity PlanDatabarracks

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web

SAP Build Work Zone - Overview L2-L3.pptx

Anypoint Exchange: It’s Not Just a Repo!

Unraveling Multimodality with Large Language Models.pdf

Story boards and shot lists for my a level piece

Gen AI in Business - Global Trends Report 2024.pdf

Human Factors of XR: Using Human Factors to Design XR Systems

Streamlining Python Development: A Guide to a Modern Project Setup

Ensuring Technical Readiness For Copilot in Microsoft 365

Nell’iperspazio con Rocket: il Framework Web di Rust!

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

DSPy a system for AI to Write Prompts and Do Fine Tuning

Commit 2024 - Secret Management made easy

Developer Data Modeling Mistakes: From Postgres to NoSQL

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

"ML in Production",Oleksandr Bagan

How to write a Business Continuity Plan

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Big data

2. Cassandra – been actively using for 2+ years

3. Hadoop – 1 yr experience, sort of

5. Options: mongodb, redis, cassandra, couch, hbase, riak, voldermort, dynamodb

6. We're gonna get billions of responses – RDBMS is going to fall over

7. We need nosql... what the hell is that??

8. Problem For Today, cont. Kevin, response=cassandra, kevin@foo.com Emma, response=redis, emma@foo.com Asher, response=cassandra, [email_address] … … … BILLIONS AND BILLIONS OF THESE!!!!

10. Key Value store

11. Ring architecture w/ replication 2^217 tokens Node 1 Node 2 Node 4 Node 3

12.

13. Column Families – std, dynamic (mo better) name preference 100 kevin cawley cassandra 101 asher cawley cassandra 102 emma cawley redis 202 201 redis ['joe','bob'] ['matthias'] cassandra ['kevin', 'asher'] ['tom'] mongodb ['holly'] ['dan'] assume User keys as utf8;

14.

15. Fanning – not the cool refreshing kind

16. Getting phased out 202 201 redis {'joe' => 'joe@foo.com , 'bob' => 'bob@boo.com'} {'matthias' => 'matthias@foo.cm', 'tom' => 'tom@boo.com'}

17.

18.

19.

20. We built our own – now free

21. Cassandra is eventually consistent makes this hard

22. Be clever and you will win

23. Demo 2

24. Counters Counter cassandra 30333 redis 22098 mongodb 24567 couch 12340 ...

25.

26.

27.

28. Map - processes a key/value pair to generate a set of intermediate key/value pairs

29. Reduce - function that merges all intermediate values associated with the same intermediate key

30.

31. cassandra asher

32.

33. Redis 1 AND the winner is cassandra w/ 2 votes!!!

34.

35.

36.

37.

38. Dangerous if you don't know what you are doing

39. Schemaless – ironically modelling is extermely imp.

40.

Big data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data

Similar to Big data (20)

Recently uploaded

Recently uploaded (20)

Big data