High order bits from cassandra & hadoop

•Download as PPTX, PDF•

2 likes•840 views

srisatish ambati

Technology

points Usecases Why NoSQL? Why cassandra? Usecase: Hadoop, Brisk FUD:Consistency Why facebook is not using Cassandra? Community, Code, Tools Q&A

Users. Netflix. Key by Customer, read-heavy Key by Customer:Movie, write-heavy

TimeSeries: (several customers) periodic readings: dev0, dev1…deviceID:metric:timestamp ->value Metrics typically way larger dataset than users.

Replication: Multi-datacenter Multi-region ec2 Multi-availability zones

reads local dc1 dc2 Replication: Multi-datacenter Multi-region ec2, aws Multi-availability zones

4.21.2011, Amazon Web Services outage: “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage: Netflix was running on AWS.

Writes Sequential, append-only. ~1-5ms On cloud: ephemeral disks rock!

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized ssds, improved read performance!

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Usecase #3: hadoop Hdfs cassandra hive Logs stats analytics

jobtracker, tasktracker hdfs: namenode, datanode

cloudera amazon: elastic map reduce hortonworks mapR brisk

Use column families (tables) inode sblock

near-real time hadoop Low latency: cassandra_dc nodes Batch Analytics: brisk_dc nodes

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2) DNS * N is replication factor. Not to be confused with T=total #of nodes

Tune-able, flexibility. For High Consistency: read:quorum, write:quorum For High Availability: high W, low R.

Inbox Search: 600+cores.120+TB (2008) Went from 100-500m users. Average NoSQL deployment size: ~6-12 nodes.

Usecase #5: search Apache Solr + Cassandra = Solandra Other inbox/file Searches: xobni, c3 github.com/tjake/solandra

“Eventual consistency is harder to program.” mostly immutable data. complex systems at scale.

Miscellaneous, Myth: data-loss, partial rows. writes are durable.

Tools AMIs, OpsCenter, DataStax AppDynamics

B e a u t i f u l C 0 d e = new code(); //less is more ~90k.java.concurrent.@annotate. bloomfilters, merkletrees. non-blocking, staged-event-driven. bigtable, dynamo.

Current & Future Focus: Distributed Counters, CQL. Simple client. operational smoothening. compaction.

Community Robust. Rapid. # Professional support from DataStax. Filesysteminnovatin from Acunu engineers: independent,startups, large companies, Rackspace, Twitter, Netflix.. Come join the efforts!

Usecase #4: first NoSQL, then scale! simpledb Cassandra mongodb Cassandra

Copyright: plantoys … more than one way to do it!

Summary - high scale peer-to-peer datastore best friend for multi-region, multi-zone availability. Hadoop – HDFS engulfing the DataWorld

What's hot

The Automation Factory

Nathan Milford

Spark application on ec2 cluster

Chao-Hsuan Shen

We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector. In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent. About the Speakers Matthias Niehoff IT-Consultant, codecentric AG works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups. Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...

DataStax

SSTable Reader Cassandra Day Denver 2014

Ben Vanberg

Cassandra CLuster Management by Japan Cassandra Community

Hiromitsu Komatsu

After a brief technical introduction to Apache Cassandra we'll then go into the exciting world of Apache Spark integration, and learn how you can turn your transactional datastore into an analytics platform. Apache Spark has taken the Hadoop world by storm (no pun intended!), and is widely seen as the replacement to Hadoop Map Reduce. Apache Spark coupled with Cassandra are perfect allies, Cassandra does the distributed data storage, Spark does the distributed computation.

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...

Data Con LA

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)

Spark Summit

Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency. About the Speaker Abhishek Verma Software Engineer, Uber Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...

DataStax

With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works. About the Speaker Eric Lubow CTO, SimpleReach Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...

DataStax

Apache Cassandra 2.0

Joe Stein

Up and running with pyspark

Krishna Sangeeth KS

Introduction to Cassandra

Gokhan Atil

This is a crash course introduction to Cassandra. You'll step away understanding how it's possible to to utilize this distributed database to achieve high availability across multiple data centers, scale out as your needs grow, and not be woken up at 3am just because a server failed. We'll cover the basics of data modeling with CQL, and understand how that data is stored on disk. We'll wrap things up by setting up Cassandra locally, so bring your laptops.

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...

DataStax Academy

Introduction to NoSQL & Apache Cassandra

Chetan Baheti

From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...

Spark Summit

Introduction to Cassandra: Replication and Consistency

Benjamin Black

Real-Time Streaming with Apache Spark Streaming and Apache Storm

Davorin Vukelic

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...

Instaclustr

Mongodb in-anger-boston-rb-2011

bostonrb

Introduction to Apache Cassandra

Robert Stupp

What's hot (20)

The Automation Factory

Spark application on ec2 cluster

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...

SSTable Reader Cassandra Day Denver 2014

Cassandra CLuster Management by Japan Cassandra Community

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...

Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...

Apache Cassandra 2.0

Up and running with pyspark

Introduction to Cassandra

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...

Introduction to NoSQL & Apache Cassandra

From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...

Introduction to Cassandra: Replication and Consistency

Real-Time Streaming with Apache Spark Streaming and Apache Storm

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...

Mongodb in-anger-boston-rb-2011

Introduction to Apache Cassandra

Similar to High order bits from cassandra & hadoop

MySQL Cluster Scaling to a Billion Queries

Bernd Ocklin

Apache cassandra and spark. you got the the lighter, let's start the fire

Patrick McFadin

Spring one2gx2010 spring-nonrelational_data

Roger Xia

On Rails with Apache Cassandra

Stu Hood

Cassandra & Python - Springfield MO User Group

Adam Hutson

Breakthrough OLAP performance with Cassandra and Spark

Evan Chan

Scala Days, Amsterdam, 2015: Lambda Architecture - Batch and Streaming with Spark, Cassandra, Kafka, Akka and Scala; Fault Tolerance, Data Pipelines, Data Flows, Data Locality, Akka Actors, Spark, Spark Cassandra Connector, Big Data, Asynchronous data flows. Time series data, KillrWeather, Scalable Infrastructure, Partition For Scale, Replicate For Resiliency, Parallelism Isolation, Data Locality, Location Transparency

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Helena Edelson

In Cassandra Lunch #72, we will discuss how we can use Databricks with Cassandra. Accompanying Blog: https://blog.anant.us/apache-cassandra-lunch-72-databricks-and-cassandra Accompanying YouTube: https://youtu.be/5zCN27KHADo Sign Up For Our Newsletter: http://eepurl.com/grdMkn Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/ Cassandra.Link: https://cassandra.link/ Follow Us and Reach Us At: Anant: https://www.anant.us/ Awesome Cassandra: https://github.com/Anant/awesome-cassandra Cassandra.Lunch: https://github.com/Anant/Cassandra.Lunch Email: solutions@anant.us LinkedIn: https://www.linkedin.com/company/anant/ Twitter: https://twitter.com/anantcorp Eventbrite: https://www.eventbrite.com/o/anant-1072927283 Facebook: https://www.facebook.com/AnantCorp/ Join The Anant Team: https://www.careers.anant.us

Apache Cassandra Lunch #72: Databricks and Cassandra

Anant Corporation

Spinnaker VLDB 2011

sandeep_tata

Kafka spark cassandra webinar feb 16 2016

Hiromitsu Komatsu

Kafka spark cassandra webinar feb 16 2016

Hiromitsu Komatsu

Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!

C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...

DataStax Academy

Netflix and Open Source

Adrian Cockcroft

Developing with Cassandra

Sperasoft

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.

NoSql Database

Suresh Parmar

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Helena Edelson

Of all the developers’ delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you’ll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...

Databricks

Scaling opensimulator inventory using nosql

David Daeschler

Scylla db deck, july 2017

Dor Laor

Big data vahidamiri-tabriz-13960226-datastack.ir

datastack

Similar to High order bits from cassandra & hadoop (20)

MySQL Cluster Scaling to a Billion Queries

Apache cassandra and spark. you got the the lighter, let's start the fire

Spring one2gx2010 spring-nonrelational_data

On Rails with Apache Cassandra

Cassandra & Python - Springfield MO User Group

Breakthrough OLAP performance with Cassandra and Spark

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Apache Cassandra Lunch #72: Databricks and Cassandra

Spinnaker VLDB 2011

Kafka spark cassandra webinar feb 16 2016

C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...

Netflix and Open Source

Developing with Cassandra

NoSql Database

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...

Scaling opensimulator inventory using nosql

Scylla db deck, july 2017

Big data vahidamiri-tabriz-13960226-datastack.ir

More from srisatish ambati

H2O Open Dallas 2016 keynote for Business Transformation

srisatish ambati

Digital Transformation with AI and Data - H2O.ai and Open Source

srisatish ambati

Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O. "Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users. Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."

Top 10 Performance Gotchas for scaling in-memory Algorithms.

srisatish ambati

Cacheconcurrencyconsistency cassandra svcc

srisatish ambati

SF Java presentation of jvm goes to big data. “Slowly yet surely the JVM is going to Big Data! In this fun filled presentation we see what pieces of Java & JVM triumph or unravel in the battle for performance at high scale!” Concurrency is the currency of scale on multi-core & the new generation of collections and non-blocking hashmaps are well worth the time taking a deep dive into. We take a quick look at the next gen serialization techniques as well as implementation pitfalls around UUID. The achilles' heel for JVM remains Garbage Collection: a deep dive into the internals of the memory model, common GC algorithms and their tuning knobs is always a big draw. EC2 & cloud present us with a virtualized & unchartered territory for scaling the JVM. We will leave some room for Q&A or fill it up with any asynchronous I/O that might queue up during the talk. A round of applause will be due to the various tools that are essentials for Java performance debugging.

Jvm goes big_data_sfjava

srisatish ambati

jvm goes to big data

srisatish ambati

Svccg nosql 2011_sri-cassandra

srisatish ambati

Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...

srisatish ambati

Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform. Presented at RedHat Summit 2010, Boston Speakers: SriSatish Ambati, Performance Engg Manik Surtani, InfiniSpan Lead Presentation details from RH Summit: How to Stop Worrying & Start Caching in Java SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc. Manik Surtani — Principal Software Engineer, Red Hat Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in. In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency. Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance. http://www.redhat.com/promo/summit/2010/sessions/jboss.html

How to Stop Worrying and Start Caching in Java

srisatish ambati

Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong JavaOne 2010. Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on. Speaker(s): Cliff Click, Azul Systems, Distinguished Engineer SriSatish Ambati, Azul Systems, Performance Engineer

JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...

srisatish ambati

Cache & Concurrency considerations for a high performance Cassandra deployment. SriSatish Ambati Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space ApacheCon2010 NA Wed, 03 November 2010 15:00 cassandra

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

srisatish ambati

More from srisatish ambati (11)

H2O Open Dallas 2016 keynote for Business Transformation

Digital Transformation with AI and Data - H2O.ai and Open Source

Top 10 Performance Gotchas for scaling in-memory Algorithms.

Cacheconcurrencyconsistency cassandra svcc

Jvm goes big_data_sfjava

jvm goes to big data

Svccg nosql 2011_sri-cassandra

Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...

How to Stop Worrying and Start Caching in Java

JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

Recently uploaded

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Architecting Cloud Native Applications

WSO2

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

DBX First Quarter 2024 Investor Presentation

Dropbox

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Exploring Multimodal Embeddings with Milvus

Zilliz

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Apidays New York 2024 - The value of a flexible API Management solution for O...

MINDCTI Revenue Release Quarter One 2024

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Architecting Cloud Native Applications

Platformless Horizons for Digital Adaptability

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Artificial Intelligence Chap.5 : Uncertainty

DBX First Quarter 2024 Investor Presentation

How to Troubleshoot Apps for the Modern Connected Worker

Exploring Multimodal Embeddings with Milvus

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Elevate Developer Efficiency & build GenAI Application with Amazon Q

High order bits from cassandra & hadoop

1. High-order bits from Cassandra & Hadoop srisatishambati @srisatish

2. NoSQL- Know your queries.

3. points Usecases Why NoSQL? Why cassandra? Usecase: Hadoop, Brisk FUD:Consistency Why facebook is not using Cassandra? Community, Code, Tools Q&A

4. Users. Netflix. Key by Customer, read-heavy Key by Customer:Movie, write-heavy

5. TimeSeries: (several customers) periodic readings: dev0, dev1…deviceID:metric:timestamp ->value Metrics typically way larger dataset than users.

6. Why Cassandra?

7. Operational simplicity peer-to-peer

8. Operational simplicity peer-to-peer

9. Replication: Multi-datacenter Multi-region ec2 Multi-availability zones

10. reads local dc1 dc2 Replication: Multi-datacenter Multi-region ec2, aws Multi-availability zones

11. 4.21.2011, Amazon Web Services outage: “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

12. 4.21.2011, Amazon Web Services outage: Netflix was running on AWS.

13. fast durable writes. fast reads.

14. Writes Sequential, append-only. ~1-5ms

15. Writes Sequential, append-only. ~1-5ms On cloud: ephemeral disks rock!

16. Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

17. Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized ssds, improved read performance!

18. Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

19. Usecase #3: hadoop Hdfs cassandra hive Logs stats analytics

20. Brisk Truly peer-to-peer hadoop.

21. mv computation not data

22.

23. Parallel Execution View

24.

25. jobtracker, tasktracker hdfs: namenode, datanode

26. cloudera amazon: elastic map reduce hortonworks mapR brisk

27. Namenode decomposition, explained.

28.

29.

30. Use column families (tables) inode sblock

31. near-real time hadoop Low latency: cassandra_dc nodes Batch Analytics: brisk_dc nodes

32. FUD, acronym: fear, uncertainty, doubt.

33. Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2) DNS * N is replication factor. Not to be confused with T=total #of nodes

34. Tune-able, flexibility. For High Consistency: read:quorum, write:quorum For High Availability: high W, low R.

35.

36. Inbox Search: 600+cores.120+TB (2008) Went from 100-500m users. Average NoSQL deployment size: ~6-12 nodes.

37. Usecase #5: search Apache Solr + Cassandra = Solandra Other inbox/file Searches: xobni, c3 github.com/tjake/solandra

38. “Eventual consistency is harder to program.” mostly immutable data. complex systems at scale.

39. Miscellaneous, Myth: data-loss, partial rows. writes are durable.

40. Three good reasons for Cassandra...

41. Tools AMIs, OpsCenter, DataStax AppDynamics

42. B e a u t i f u l C 0 d e = new code(); //less is more ~90k.java.concurrent.@annotate. bloomfilters, merkletrees. non-blocking, staged-event-driven. bigtable, dynamo.

43. Current & Future Focus: Distributed Counters, CQL. Simple client. operational smoothening. compaction.

44. Community Robust. Rapid. # Professional support from DataStax. Filesysteminnovatin from Acunu engineers: independent,startups, large companies, Rackspace, Twitter, Netflix.. Come join the efforts!

45.

46. Usecase #4: first NoSQL, then scale! simpledb Cassandra mongodb Cassandra

47.

48.

49. Copyright: xkcd

50. Copyright: plantoys … more than one way to do it!

51. Summary - high scale peer-to-peer datastore best friend for multi-region, multi-zone availability. Hadoop – HDFS engulfing the DataWorld

52. Q&A @srisatish

53. NoSQL- Know your queries.

High order bits from cassandra & hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to High order bits from cassandra & hadoop

Similar to High order bits from cassandra & hadoop (20)

More from srisatish ambati

More from srisatish ambati (11)

Recently uploaded

Recently uploaded (20)

High order bits from cassandra & hadoop