Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cassandra, Modeling and Availability at AMUGMatthew Dennis
brief high level comparison of modeling between relational databases and Cassandra followed by a brief description of how Cassandra achieves global availability
We describe the features of Oak Lucene indexes and how they can be used to get your queries perform better. In the second part we will talk about how asynchronous indexing works in general and how it can be monitored.
This was presented as part of AEM Gem Series -http://dev.day.com/content/ddc/en/gems/oak-lucene-indexes.html
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
In a real life almost any project deals with the
tree structures. Different kinds of taxonomies,
site structures etc require modeling of
hierarchy relations.
Typical approaches used
● Model Tree Structures with Child References
● Model Tree Structures with Parent References
● Model Tree Structures with an Array of Ancestors
● Model Tree Structures with Materialized Paths
● Model Tree Structures with Nested Sets
Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data-centres,with asynchronous master-less replication allowing low latency operations for all clients.
Cassandra, Modeling and Availability at AMUGMatthew Dennis
brief high level comparison of modeling between relational databases and Cassandra followed by a brief description of how Cassandra achieves global availability
We describe the features of Oak Lucene indexes and how they can be used to get your queries perform better. In the second part we will talk about how asynchronous indexing works in general and how it can be monitored.
This was presented as part of AEM Gem Series -http://dev.day.com/content/ddc/en/gems/oak-lucene-indexes.html
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
In a real life almost any project deals with the
tree structures. Different kinds of taxonomies,
site structures etc require modeling of
hierarchy relations.
Typical approaches used
● Model Tree Structures with Child References
● Model Tree Structures with Parent References
● Model Tree Structures with an Array of Ancestors
● Model Tree Structures with Materialized Paths
● Model Tree Structures with Nested Sets
Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data-centres,with asynchronous master-less replication allowing low latency operations for all clients.
Cassandra Tutorial | Data types | Why Cassandra for Big Datavinayiqbusiness
Apache Cassandra is an open-source, NoSQL, wide column data store that can quickly take and process huge amounts of data.
It is decentralized, distributed, scalable, highly available, and fault-tolerant, , with identical nodes that are clustered together for eliminating single points of failure.
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
Scala with MongoDB
MongoDB is a document-oriented database management system designed for performance, horizontal scalability, high availability,open source NoSQL database(Schemaless or Non-relational) ,And advanced queryability. ➢ MongoDB is a document-based database system, and as a result, all records, or data, in MongoDB are documents.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
The database industry has been abuzz over the past year about NoSQL databases. Apache Cassandra, which has quickly emerged as a best-of-breed solution in this space, is used at many companies to achieve unprecedented scale while maintaining streamlined operations.
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
This presentation explains how to get started with Apache Cassandra to provide a scale out, fault tolerant backend for inventory storage on OpenSimulator.
Detail behind the Apache Cassandra 2.0 release and what is new in it including Lightweight Transactions (compare and swap) Eager retries, Improved compaction, Triggers (experimental) and more!
• CQL cursors
Book: Software Architecture and Decision-MakingSrinath Perera
Uncertainty is the leading cause of mistakes made by practicing software architects. The primary goal of architecture is to handle uncertainty arising from user cases as well as architectural techniques. The book discusses how to make architectural decisions and manage uncertainty. From the book, You will learn common problems while designing a system, a default solution for each, more complex alternatives, and 5Q & 7P (Five Questions and Seven Principles) that help you choose.
Book, https://amzn.to/3v1MfZX
Blog: http://tinyurl.com/swdmblog
Six min video - https://youtu.be/jtnuHvPWlYU
We have critically evaluated how AI will shape integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study.
We observe that AI can significantly impact integration use cases and identify 13 AI-based use case classes for integration. Points to note include:
Enabling AI in an enterprise involves collecting, cleaning up, and creating a single representation of data as well as enforcing decisions and exposing data outside, each of which leads to many integration use cases. Hence, AI indirectly creates demand for integration.
AI needs data, which in some cases lead to significant competitive advantages. The need to collect data would drive vendors to offer most AI products in the cloud through APIs.
Due to lack of expertise and data, custom AI model building will be limited to large organizations. It is hard for small and medium size organization to build and maintain custom models.
The Role of Blockchain in Future IntegrationsSrinath Perera
We have critically evaluated blockchain-based integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study. Based on our analysis, we observe that blockchain can significantly impact integration use cases.
In our paper, we identify 30-plus blockchain-based use cases for integration and four architecture patterns. Notably, each use case we identified can be implemented using one of the architecture patterns. Furthermore, we also discuss challenges and risks posed by blockchains that would affect these architecture patterns.
Our webinar presents a critical analysis of serverless technology and our thoughts about its future. We use Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, as the methodology of our study. Based on our analysis, we believe that serverless can significantly impact applications and software development workflows.
We’ve also made two further observations:
Limitations, such as tail latencies and cold starts, are not deal breakers for adoption. There are significant use cases that can work with existing serverless technologies despite these limitations.
We see a significant gap in required tooling and IDE support, best practices, and architecture blueprints. With proper tooling, it is possible to train existing enterprise developers to program with serverless. If proper tools are forthcoming, we believe serverless can cross the chasm in 3-5 years.
A detailed analysis can be found here: A Survey of Serverless: Status Quo and Future Directions. Join our webinar as we discuss this study, our conclusions, and evidence in detail.
1. Blockchain potential impact is real. If successful, Blockchain technologies can transform the way we live our day to day lives.
2. We believe technology is ready for limited applications in Digital Currency, Lightweight financial systems, Ledgers (of identity, ownership, status, and authority), Provenance (e.g. supply chains and other B2B scenarios) and Disintermediation, which we believe will happen in next three years.
3. However, with other use cases, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. These are hard problems and
4. It is not clear whether blockchain can sustain the current level of effort for extended period of 5+ years. There are many startups and they run the risk of running out of money before markets are ready. Failure of startups can inhibit further funding and investments.
5. Value and need of decentralization compared to centralized and semi-centralized alternatives is not clear.
A Visual Canvas for Judging New TechnologiesSrinath Perera
In the fast-changing technology world, the technology landscape shifts faster and faster. The agents of thses changes are new emerging technologies, which sometimes even create, destroy, or transform segments. In a shifting world, prevailing advantages are fleeting. Organizations that can master change and ride technology waves owns the future.
Not all emerging technologies live up to their promise. Every year, as a part of annual planning, most organizations need to decide relevance, impact, and the probability of success of emerging technologies and pick their bets. Although it is a regular decision there is no widely accepted framework for evaluating emerging technologies.
As a solution to this problem, we present “Emerging Technology Analysis Canvas” (ETAC), a framework to assess an individual emerging technology as a solution to this problem. Inspired by the Business Model Canvas, It represents different aspects of technology visually on a single page. This approach includes a set of questions that probe the technology arranged around a logical narrative. The visual representation is concise, compact, and comprehensible in a glance.
The talk discusses how analytics can attack privacy and what we can do about it. It discusses the legal responses (e.g. GDPR) as well technical responses ( differential privacy and homomorphic encryption).
The video is in https://www.facebook.com/eduscopelive/videos/314847475765297/ from 1.18.
Blockchain is often cited as one of the most impactful technology along with AI. It has attracted many startups, venture investments, and academic research. If successful, Blockchain technologies can transform the way, we live our day to day lives.
However, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. They are hard problems, and likely it will take at least 5-10 years to find answers to those problems.
Given the risk involved as well as the significant potential returns, we recommend a cautiously optimistic approach for blockchain with the focus on concrete use cases.
Today's Technology and Emerging Technology LandscapeSrinath Perera
We have seen the rise and fall of many technologies, some disappearing without a trace while others redefining the world. Collectively they have shaped our world beyond recognition. In this talk, Srinath will start with past technologies exploring their behavior. Then he will explore current middleware landscape, its composition, and relationships between different segments. He will discuss significant developments and discuss their future. Further, he will discuss emerging technologies, forces that shape them, and the promise of each technology, and finally, speculate about their evolution. You will walk away with knowledge on the evolution of middleware, the status quo, and discussion about how, at WSO2, we think those technologies will evolve.
Some died, some get by, but some have woven themselves to today's middleware so much that we do not notice them. The point I want to make is that not all emerging technologies are fads. Some are, and some are too early, like AI. But some are lasting.
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble.
If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL.
In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.
Analytics and AI: The Good, the Bad and the UglySrinath Perera
Analytics let us question the data, which in effect questions the world around us. This let us understand, monitor, and shape the world. AI let us discover connections, predict the possible futures and automate tasks.
These twin technologies can change the world around us. On one hand, make us efficient, connected, and fulfilled. At the same time, the change of status quo can replace jobs, affect lives and build biases into our systems that can marginalize millions.
In this talk, we will discuss core ideas behind analytics and AI, their possible impact, both good and bad outcomes, and challenges.
The dawn of digital businesses is upon us, with reimagined business models that make the best use of digital technologies such as automation, analytics, integration and cloud. Digital businesses are efficient, continuously optimizing, proactive, flexible and are able to fully understand their customers. Analytics is a key technology that helps in doing so. It acts as the eyes and ears of the system and provides a holistic view on the past and present so that decision-makers can predict what will happen in the future. This webinar will explore
Why becoming a digital business is not a choice
The role of analytics in digital transformation with examples
How best to leverage state of the art analytics technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
This talk discusses Outline of the state of the art of Enterprise Software and how we get there, as I see it. Also second part describes Ballerina, a new programming language WSO2 has built for Enterprise Computing.
It is presented as a Keynote at 11th Symposium and Summer School On Service-Oriented Computing.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Introduction to Apache Cassandra and support within WSO2 Platform
1. Introduction to Apache Cassandra and support within WSO2 Platform Srinath Perera WSO2 Inc.
2. Cassandra within the WSO2 Platform We support Apache Cassandra within WSO2 Platform This is to provide NoSQL data support within the platform Cassandra can be used for both Column family or Key-value pair usecases. Fully integrated with the Platform We will discuss what this means.
3. What is Cassandra? Apache Cassandra http://cassandra.apache.org/ NoSQL column family implementation (more about it later) Highly scalable, available and no Single Point of Failure. Very high write throughput and good read throughput. It is pretty fast. SQL like query language (from 0.8) and support search through secondary indexes (well no JOINs, Group By etc. ..). Tunable consistency and support replication Flexible Schema
4. Column Family Data Model Column – name, value, and a timestamp (ignore this for now). Column is bit of a misnomer, may be they should have called it a named cell. E.g. author=“Asimov” . Row – row is a collection of Columns with a name. entries are sorted by the column names. You can do a slice and get some of the columns only. E.g. “Second Foundation”->{author=“Asmiov”, publishedDate=“..”, tag=“sci-fi”, tag2=“asimov” } Column family – Collection of rows, usually no sort order among rows*. Books->{ “Foundation”->{author=“Asmiov”, publishedDate=“..”}, “Second Foundation”->{author=“Asmiov”, publishedDate=“..”}, ….. } There are other stuff, but these are the key.
5. Column Family Data Model (Contd.) It is crucial to understand that Cassandra Columns are very different from RDBMS Columns. Columns are only applied within a given row, different row may have different columns. You can have thousands to millions of column for a row (2 million max, and a row should fit in one node). Column names may represent data, not just metadata like with RDBMS. You will understand more with the example.
7. Example: Book Rating Site Let us take a Book rating site as an example. Users add books, comment them and tag them. Can Add books (author, rank, price, link) Can add Comments for books (text, time, name) Can add tags for books Need to list books sorted by rank Need to list books by tag Need to list comments for a book
8. Relational Approach Schema Books(bookid, author, rank, price, link) Comments->(id, text, user, time, bookid) Tags(id, bookid, tag) Queries Select * from Books orderby rank; Select text, time, user from Comments where bookid=? Orderby time Select tag from Tags where bookid=? Select bookid from Tags where tag=“” Select distinct author from Tags, Books where Tags.bookid=Books.bookidand tag=?
11. Some Queries You Can Not Do Above setup can do some queries it designed for. It can not queries it can not designed for For example, it can not do following Select * from Books where price > 50; Select * from Books where author=“Asimov” Select * from Books, Comments where rank> 9 && Comments.bookid=Books.bookid; Well it can, but by writing code to walk through. It is like supporting search by going through all the data. This is a limitation, specially when queries are provided at the runtime.
12. A Sample Program Cluster cluster = HFactory.createCluster("TestCluster", new CassandraHostConfigurator("localhost:9160”)); Keyspacekeyspace = HFactory.createKeyspace(keyspaceName, cluster); Mutator<String> mutator = HFactory.createMutator(keyspace, sser); mutator.insert(“wso2”, columnFamily, HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306")); ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address"); QueryResult<HColumn<String, String>> result = columnQuery.execute(); System.out.println("received "+ result.get().getName() + "= " + result.get().getValue() + " ts = "+ result.get().getClock());
13. Cassandra: How does it work? Nodes are arranged in a circle according to a key space(P2P networkand uses consistent hashing). Each node owns the next clockwise address space. If replicated, each node owns next two clockwise address spaces. Any node can accept any request and route it to the correct node.
14. Cassandra: How does it work? (Contd.) Writes are written to enough nodes, and Cassandra repairs data while reading. (As you would guess, that is how writes are fast.) Data is updated in the memory, and it keeps an append only commit log to recover from failures. (This avoid rotational latency at the disk). Can do about 80-360MB/sec per node. When ever a read happens, Cassandra will sync all the nodes having replicas (read repair).
15. All these are great, but what is the catch? Do not get me wrong, Cassandra is a great tool, but you have to know where it does not work.
16. Surprises if you are using Cassandra No transactions, no JOINs. Hope there is no surprise here. No foreign keys, and keys are immutable. (well no JOINs, and use surrogate keys if you need to change keys) Keys has to be unique (use composite keys) Super Columns and order preserving partitioner are discouraged. Searching is complicated No Search coming from the core. Secondary indexes are layered on top, and they do not do range search or pattern search. When secondary indexes does not work, have to learn the data model and build your indexes using sort orders and slices. Sort orders are complicated Column are always sorted by name, but row order depends on the partitioner. Sort orders are crucial when you build your own indexes.
17. Surprises if you are using Cassandra (Cont.) Failed Operations may leave changes If operation is successful, all is well If it failed, actually changes may have been applied. But operations are idempotent, so you can retry until successful. Batch operations are not atomic, but you can retry until successful (as operations are idempotent). If a node fails, Cassandra does not figure it out and do a self healing. Assuming you have replicas, things will continue to work. But the whole system recovers only when a manual recover operation is done. It remembers deletes When we delete a data item, a node may be down at the time and may come back after the delete is done. To avoid this, Cassandra mark the as deleted (Tombstones) but does not delete this until configurable timeout or a repair. Space is actually freed up only then.
19. Cassandra within the WSO2 Platform As a part of WSO2 data solutions Because one storage cannot handle all cases Specifically for applications that need to scale. For applications that can work with a single DB, we have “Database as a Service” Two offerings Provide Cassandra as a Service Provide Cassandra within Carbon as a standalone product (integrated with WSO2 security model)
20. Apache Cassandra as a Service Users can log in to the Web Console (both in Stratos and in WSO Data Server) and create Cassandra key spaces.
21. Apache Cassandra as a Service (Contd.) Key spaces will be allocated from a Cassandra cluster they are isolated from other tenants in Stratos it is integrated with WSO2 Security model. Users can manage and share his key spaces through Stratos Web Console and use those key spaces through Hector Client (Java Client for Cassandra) In essence we provide Cassandra as a part of Stratos as a Service Multi-tenancy support Security integration with WSO2 security model
22. A sample Program Map<String, String> credentials = new HashMap<String, String>(); credentials.put(USERNAME_KEY, "admin@srinath.org"); credentials.put(PASSWORD_KEY, "admin1234"); Cluster cluster = HFactory.createCluster("TestCluster", new CassandraHostConfigurator("localhost:9160”, credentials)); Keyspacekeyspace = HFactory.createKeyspace(keyspaceName, cluster); Mutator<String> mutator = HFactory.createMutator(keyspace, sser); mutator.insert(“wso2”, columnFamily, HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306")); ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address"); QueryResult<HColumn<String, String>> result = columnQuery.execute(); System.out.println("received "+ result.get().getName() + "= " + result.get().getValue() + " ts = "+ result.get().getClock());
24. Implementation (Contd.) Cassandra includes a plug point to add support for different security models at the server (Authentication and authorization for server). We do security integration and support isolation among tenants (multi-tenancy) by writing new implementation of this plug point. Also we provide a Web console to manage Cassandra Key spaces. Cassandra is highly scalable and highly available, so no work needed at that department.
25. Cassandra within Carbon Platform Users may choose to run Carbon enabled Cassandra also in two other alternative settings. Running whole Stratos within a private Cloud Gets full support for the Multi-tenancy and other cloud benefits Let user run it in his own controlled environment Running a standalone Cassandra node (without Multi-tenancy) Get seamless integration with WSO2 Security model Use the Configuration Console for Cassandra
27. Summary We discuss what Cassandra is, its strength, weaknesses, and Column Family Data Model. Has a data model very different from relational style Need users to rethink their data model There is a complexity at design, which is a tradeoff for achieving higher scalability. Of course, Cassandra is not the solution for everything. It should be used when it make sense based on the usecase. We discuss Cassandra integration to WSO2 platform Carbon integration – how to run Cassandra that is integrated with WSO2 Carbon platform security model. Cassandra as a Service – how to use Cassandra as a Service from WSO2 Stratos Platform as a Service offering.
28. References Apache Cassandra http://cassandra.apache.org Understanding Column family Model - http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model Hector Client http://github.com/rantav/hector http://prettyprint.me/2010/08/06/hector-api-v2/ Some Theory Malae, N., Cassandra--A Decentralized Structured Storage System Chang, F. and Dean, J. and Ghemawat, S. and Hsieh, W.C. and Wallach, D.A. and Burrows, M. and Chandra, T. and Fikes, A. and Gruber, R.E., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), 2008