Couchbase is a popular open source NoSQL platform used by giants like Apple, LinkedIn, Walmart, Visa and many others and runs on-premise or in a public/hybrid/multi cloud.
Couchbase has a sub-millisecond K/V cache integrated with a document based DB, a unique and many more services and features.
In this session we will talk about the unique architecture of Couchbase, its unique N1QL language - a SQL-Like language that is ANSI compliant, the services and features Couchbase offers and demonstrate some of them live.
We will also discuss what makes Couchbase different than other popular NoSQL platforms like MongoDB, Cassandra, Redis, DynamoDB etc.
At the end we will talk about the next version of Couchbase (6.5) that will be released later this year and about Couchbase 7.0 that will be released next year.
MongoDB is an open-source document database, and the leading NoSQL database. Written in C++.
MongoDB has official drivers for a variety of popular programming languages and development environments. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.
We believe that security *IS* a shared responsibility, - when we give developers the power to create infrastructure, security became their responsibility, too.
During this meetup, we'd like to share our experience with implementing security best practices, to be implemented directly by development teams to build more robust and secure cloud environments. Make cloud security your team's sport!
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
What You Will Learn At This Meetup:
• Review of Cassandra analytics landscape: Hadoop & HIVE
• Custom input formats to extract data from Cassandra
• How Spark & Shark increase query speed & productivity over standard solutions
Abstract
This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.
About Evan Chan
Evan Chan is a Software Engineer at Ooyala. In his own words: I love to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. I am a big believer in GitHub, open source, and meetups, and have given talks at conferences such as the Cassandra Summit 2013.
South Bay Cassandra Meetup URL: http://www.meetup.com/DataStax-Cassandra-South-Bay-Users/events/147443722/
MongoDB is an open-source document database, and the leading NoSQL database. Written in C++.
MongoDB has official drivers for a variety of popular programming languages and development environments. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.
We believe that security *IS* a shared responsibility, - when we give developers the power to create infrastructure, security became their responsibility, too.
During this meetup, we'd like to share our experience with implementing security best practices, to be implemented directly by development teams to build more robust and secure cloud environments. Make cloud security your team's sport!
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
What You Will Learn At This Meetup:
• Review of Cassandra analytics landscape: Hadoop & HIVE
• Custom input formats to extract data from Cassandra
• How Spark & Shark increase query speed & productivity over standard solutions
Abstract
This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.
About Evan Chan
Evan Chan is a Software Engineer at Ooyala. In his own words: I love to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. I am a big believer in GitHub, open source, and meetups, and have given talks at conferences such as the Cassandra Summit 2013.
South Bay Cassandra Meetup URL: http://www.meetup.com/DataStax-Cassandra-South-Bay-Users/events/147443722/
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldAjay Gupte
In analytics world, when you need to process many millions or billions of documents to generate a single report. Novel techniques have been developed for exploiting modern processor architecture (larger on-chip cache, SIMD processing, compression, vector processing, columnar approach). Now, this technology is available to process your large JSON data. This talk will discuss analysis of JSON data using advanced data warehousing techniques and make it simple and seamless for the application/tool developer.
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
Businesses are generating and ingesting an unprecedented volume of structured and unstructured data to be analyzed. Needed is a scalable Big Data infrastructure that processes and parses extremely high volume in real-time and calculates aggregations and statistics. Banking trade data where volumes can exceed billions of messages a day is a perfect example.
Firms are fast approaching 'the wall' in terms of scalability with relational databases, and must stop imposing relational structure on analytics data and map raw trade data to a data model in low latency, preserve the mapped data to disk, and handle ad-hoc data requests for data analytics.
Joe discusses and introduces NoSQL databases, describing how they are capable of scaling far beyond relational databases while maintaining performance , and shares a real-world case study that details the architecture and technologies needed to ingest high-volume data for real-time analytics.
For more information, visit www.casertaconcepts.com
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Dave Stokes
Speeding up queries on a MySQL server with indexes and histograms is not a mysterious art but simple engineering. This presentation is an indepth introduction that was presented on March 30th to the Quest Insynch and Open Source 101 conferences
MySQL can now be used as a NoSQL JSON Document store so you get the best of NoSQL and SQL world. This talk covers the features of the X Devapi, the MySQL Document Store, and how to use relational tables with the new SQL features
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Apache Cassandra is a leading open-source distributed database capable of amazing feats of scale, but its data model requires a bit of planning for it to perform well. Of course, the nature of ad-hoc data exploration and analysis requires that we be able to ask questions we hadn’t planned on asking—and get an answer fast. Enter Apache Spark.
Spark is a distributed computation framework optimized to work in-memory, and heavily influenced by concepts from functional programming languages. It’s exactly what a Cassandra cluster needs to deliver real-time, ad-hoc querying of operational data at scale.
In this talk, we’ll explore Spark and see how it works together with Cassandra to deliver a powerful open-source big data analytic solution.
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
Today’s services rely on massive amount of data to be processed, but require at the same time to be fast and responsive. Building fast services on big data batch- oriented frameworks is definitely a challenge. At ING, we have worked on a stack that can alleviate this problem. Namely, we materialize data model by map-reducing Hadoop queries from Hive to Cassandra. Instead of sinking the results back to hdfs, we propagate the results into Cassandra key-values tables. Those Cassandra tables are finally exposed via a http API front-end service.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake.
During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables.
This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldAjay Gupte
In analytics world, when you need to process many millions or billions of documents to generate a single report. Novel techniques have been developed for exploiting modern processor architecture (larger on-chip cache, SIMD processing, compression, vector processing, columnar approach). Now, this technology is available to process your large JSON data. This talk will discuss analysis of JSON data using advanced data warehousing techniques and make it simple and seamless for the application/tool developer.
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
Businesses are generating and ingesting an unprecedented volume of structured and unstructured data to be analyzed. Needed is a scalable Big Data infrastructure that processes and parses extremely high volume in real-time and calculates aggregations and statistics. Banking trade data where volumes can exceed billions of messages a day is a perfect example.
Firms are fast approaching 'the wall' in terms of scalability with relational databases, and must stop imposing relational structure on analytics data and map raw trade data to a data model in low latency, preserve the mapped data to disk, and handle ad-hoc data requests for data analytics.
Joe discusses and introduces NoSQL databases, describing how they are capable of scaling far beyond relational databases while maintaining performance , and shares a real-world case study that details the architecture and technologies needed to ingest high-volume data for real-time analytics.
For more information, visit www.casertaconcepts.com
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Dave Stokes
Speeding up queries on a MySQL server with indexes and histograms is not a mysterious art but simple engineering. This presentation is an indepth introduction that was presented on March 30th to the Quest Insynch and Open Source 101 conferences
MySQL can now be used as a NoSQL JSON Document store so you get the best of NoSQL and SQL world. This talk covers the features of the X Devapi, the MySQL Document Store, and how to use relational tables with the new SQL features
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Apache Cassandra is a leading open-source distributed database capable of amazing feats of scale, but its data model requires a bit of planning for it to perform well. Of course, the nature of ad-hoc data exploration and analysis requires that we be able to ask questions we hadn’t planned on asking—and get an answer fast. Enter Apache Spark.
Spark is a distributed computation framework optimized to work in-memory, and heavily influenced by concepts from functional programming languages. It’s exactly what a Cassandra cluster needs to deliver real-time, ad-hoc querying of operational data at scale.
In this talk, we’ll explore Spark and see how it works together with Cassandra to deliver a powerful open-source big data analytic solution.
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
Today’s services rely on massive amount of data to be processed, but require at the same time to be fast and responsive. Building fast services on big data batch- oriented frameworks is definitely a challenge. At ING, we have worked on a stack that can alleviate this problem. Namely, we materialize data model by map-reducing Hadoop queries from Hive to Cassandra. Instead of sinking the results back to hdfs, we propagate the results into Cassandra key-values tables. Those Cassandra tables are finally exposed via a http API front-end service.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake.
During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables.
This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.
N1QL = SQL + JSON. N1QL gives developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data. We begin with a brief overview. Couchbase 5.0 has language and performance improvements for pagination, index exploitation, integration, and more. We’ll walk through scenarios, features, and best practices.
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Aaron Benton
Couchbase Server is a NoSQL document database with a distributed architecture for performance, scalability, and availability. It enables developers to build applications easier and faster by leveraging the power of SQL with the flexibility of JSON.
Aaron Benton is an Applications Architect at SHOP.COM. He has used Couchbase in a number of different applications and will share his experience with the product.
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
N1QL gives developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data. We’ll begin this session with a brief overview of N1QL and then explore some key enhancements we’ve made in the latest versions of Couchbase Server. Couchbase Server 5.0 has language and performance improvements for pagination, index exploitation, integration, index availability, and more. Couchbase Server 5.5 will offer even more language and performance features for N1QL and global secondary indexes (GSI), including ANSI joins, aggregate performance, index partitioning, auditing, and more. We’ll give you an overview of the new features as well as practical use case examples.
NoSql DBs are really popular in the BigData landscape, but SQL semantic is taking revenge. Instead of learning many DSL, developers prefer to use the well know and universal SQL query, so roughly all big data solutions are forced to support SQL semantic over their data models.
From Document to Graph DBs, from search to streaming platforms, all the ways to query Big data through SQL.
NewSQL - Deliverance from BASE and back to SQL and ACIDTony Rogerson
There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL.
This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
achine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
Machine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
The technology of fake news between a new front and a new frontier | Big Dat...Omid Vahdaty
קוראים לי ניצן אור קדראי ואני עומדת בצומת המעניינת שבין טכנולוגיה, מדיה ואקטיביזם.
בארבע וחצי השנים האחרונות אני עובדת בידיעות אחרונות, בהתחלה כמנהלת המוצר של אפליקציית ynet וכיום כמנהלת החדשנות.
הייתי שותפה בהקמת עמותת סטארט-אח, עמותה המספקת שירותי פיתוח ומוצר עבור עמותות אחרות, ולאחרונה מתעסקת בהקמת קהילה שמטרתה לחקור את ההיבטים הטכנולוגיים של תופעת הפייק ניוז ובניית כלים אפליקטיביים לצורך ניהול חכם של המלחמה בתופעה.
ההרצאה תדבר על תופעת הפייק ניוז. נתמקד בטכנולוגיה שמאפשרת את הפצת הפייק ניוז ונראה דוגמאות לשימוש בטכנולוגיה זו.
נבחן את היקף התופעה ברשתות החברתיות ונלמד איך ענקיות הטכנולוגיה מנסות להילחם בה.
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
What we're about
A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…
Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS infrastructure to answer the basic questions of anyone starting their way in the big data world.
how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORCwhich technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL?how to handle streaming?how to manage costs?Performance tips?Security tip?Cloud best practices tips?
Some of our online materials:
Website:
https://big-data-demystified.ninja/
Youtube channels:
https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber
Meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
https://www.meetup.com/Big-Data-Demystified
Facebook Group :
https://www.facebook.com/groups/amazon.aws.big.data.demystified/
Facebook page (https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/)
Audience:
Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D
Making your analytics talk business | Big Data DemystifiedOmid Vahdaty
MAKING YOUR ANALYTICS TALK BUSINESS
Aligning your analysis to the business is fundamental for all types of analytics (digital or product analytics, business intelligence, etc) and is vertical- and tool agnostic. In this talk we will build on the discussion that was started in the previous meetup, and will discuss how analysts can learn to derive their stakeholders' expectations, how to shift from metrics to "real" KPIs, and how to approach an analysis in order to create real impact.
This session is primarily geared towards those starting out into analytics, practitioners who feel that they are still struggling to prove their value in the organization or simply folks who want to power up their reporting and recommendation skills. If you are already a master at aligning your analysis to the business, you're most welcome as well: join us to share your experiences so that we can all learn from each other and improve!
Bios:
Eliza Savov - Eliza is the team lead of the Customer Experience and Analytics team at Clicktale, the worldwide leader in behavioral analytics. She has extensive experience working with data analytics, having previously worked at Clicktale as a senior customer experience analyst, and as a product analyst at Seeking Alpha.
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...Omid Vahdaty
In the talk we will discuss how to break down the company’s overall goals all the way to your BI team’s daily activities in 3 simple stages:
1. Understanding the path to success - Creating a revenue model
2. Gathering support and strategizing - Structuring a team
3. Executing - Tracking KPIs
Bios:
Omri Halak -Omri is the director of business operations at Logz.io, an intelligent and scalable machine data analytics platform built on ELK & Grafana that empowers engineers to monitor, troubleshoot, and secure mission-critical applications more effectively. In this position, Omri combines actionable business insights from the BI side with fast and effective delivery on the Operations side. Omri has ample experience connecting data with business, with previous positions at SimilarWeb as a business analyst, at Woobi as finance director, and as Head of State Guarantees at Israel Ministry of Finance.
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...Omid Vahdaty
Lecturer has Deep experience defining Cloud computing, security models for IaaS, PaaS, and SaaS architectures specifically as the architecture relates to IAM. Deep Experience Defining Privacy protection Policy, a big fan of GDPR interpretation.
DeelExperience in Information security, Defining Healthcare security best practices including AI and Big Data, IT Security and ICS security and privacy controls in the industrial environments.
Deep knowledge of security frameworks such as Cloud Security Alliance (CSA), International Organization for Standardization (ISO), National Institute of Standards and Technology (NIST), IBM ITCS104 etc.
What Will You learn:
Every day, the website collects a huge amount of data. The data allows to analyze the behavior of Internet users, their interests, their purchasing behavior and the conversion rates. In order to increase business, big data offers the tools to analyze and process data in order to reveal competitive advantages from the data.
What Healthcare has to do with Big Data
How AI can assist in patient care?
Why some are afraid? Are there any dangers?
Aerospike meetup july 2019 | Big Data DemystifiedOmid Vahdaty
Building a low latency (sub millisecond), high throughput database that can handle big data AND linearly scale is not easy - but we did it anyway...
In this session we will get to know Aerospike, an enterprise distributed primary key database solution.
- We will do an introduction to Aerospike - basic terms, how it works and why is it widely used in mission critical systems deployments.
- We will understand the 'magic' behind Aerospike ability to handle small, medium and even Petabyte scale data, and still guarantee predictable performance of sub-millisecond latency
- We will learn how Aerospike devops is different than other solutions in the market, and see how easy it is to run it on cloud environments as well as on premise.
We will also run a demo - showing a live example of the performance and self-healing technologies the database have to offer.
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...Omid Vahdaty
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS
-Learn how to connect BI and product management to solve business problems
-Discover how to lead clients to ask the right questions to get the data and insight they really want
-Get pointers on saving your time and your company's resources by understanding what your customers need, not what they ask for
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…
Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS & GCP and Data Center infrastructure to answer the basic questions of anyone starting their way in the big data world.
how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORC,AVRO which technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL? GCS? Big Query? Data flow? Data Lab? tensor flow? how to handle streaming? how to manage costs? Performance tips? Security tip? Cloud best practices tips?
In this meetup we shall present lecturers working on several cloud vendors, various big data platforms such hadoop, Data warehourses , startups working on big data products. basically - if it is related to big data - this is THE meetup.
Some of our online materials (mixed content from several cloud vendor):
Website:
https://big-data-demystified.ninja (under construction)
Meetups:
https://www.meetup.com/Big-Data-Demystified
https://www.meetup.com/AWS-Big-Data-Demystified/
You tube channels:
https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber
https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
Audience:
Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...Omid Vahdaty
AWS Big Data Demystified is all about knowledge sharing b/c knowledge should be given for free. in this lecture we will dicusss the advantages of working with Zeppelin + spark sql, jdbc + thrift, ganglia, r+ spark r + livy, and a litte bit about ganglia on EMR.\
subscribe to you youtube channel to see the video of this lecture:
https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
amazon aws big data demystified meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
Introduction to streaming and messaging flume kafka sqs kinesis
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
3. 500+ Digital Businesses Run on Couchbase
6 of the Top 10
E-Commerce
Companies
in the US
6 of the Top 10
US & European
Broadcast
Companies
6 of the Top 10
Online Casino
Gaming
Companies
The Top 3
Credit Reporting
Companies
The top 3
GDS Companies
3 of the Top 10
Airlines
5. Couchbase = K/V + Document DB + ….
• Couchbase is a hybrid engine system:
• Super fast K/V engine
• Based on Memcached distributed cache.
• Document DB engine
• Uses the K/V engine for super fast performance
• ANSI SQL-like language on JSON data.
• A distributed cache & a database –
IN ONE PLATFORM
6. Why use a document based DB?
• Flexible schema = faster development.
• No code impedance
• The data structure in the database DB the data structure in your code.
• Easy & fast deployments.
• Easy maintenance.
• Best fit microservices architecture.
So why does RDBMS still very popular?
SQL
7. The Power of the Flexible JSON Schema
• Ability to store data in multiple
ways
o Denormalized single document, as
opposed to normalizing data across
multiple table
o Dynamic Schema to add new values
when needed
8. Efficient Sub-Document Operations
• Document Mutations:
• Atomic Operate on individual fields
• Identical syntax behavior to regular bucket methods
(upsert, insert, get, replace)
• Support for JSON fragments.
• Support for Arrays with uniqueness guarantees and
ordinal placement (front/back)
9. Nickel (N1QL) : SQL-Like Querying Support
• SQL-like Query Language
• Expressive, familiar, and feature-rich language for querying, transforming, and manipulating
JSON data
• ANSI 92 SQL Compatible – Selects, Inserts, Updates, Group By, Sort,
Functions etc.
• N1QL extends SQL to handle data that is:
• Nested: Contains nested objects, arrays
• Heterogeneous: Schema-optional, non-uniform
• Distributed: Partitioned across a cluster Flexibility
of JSON
Power of
SQL
14. Cache Ejection
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 2DOC 3DOC 4DOC 5
DOC 1
DOC 2 DOC 3 DOC 4 DOC 5
Single-node type means
easier administration and
scaling
Layer consolidation means read
through and write through cache
Couchbase automatically removes
data that has already been
persisted from RAM
15. Cache Miss
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1 DOC 2 DOC 3 DOC 4 DOC 5
DOC 2 DOC 3 DOC 4 DOC 5
GET
DOC 1
DOC 1
DOC 1
Single-node type means
easier administration and
scaling
Layer consolidation means 1
single interface for App to talk to
and get its data back as fast as
possible
Separation of cache and disk
allows for fastest access out of
RAM while pulling data from disk
in parallel
16. Persistence
• Guards against most form of
failures
• Protects against data loss
• Configurable durability
• Always on Availability
17. Auto Sharding – Bucket and vBuckets
Virtual buckets
A bucket is a logical, unique key space
Multiple buckets can exist within a single cluster of nodes
Each bucket has active and replica data sets (1, 2 or 3 extra
copies)
Each data set has 1024 Virtual Buckets (vBuckets)
Each vBucket contains 1/1024th portion of the data set
vBuckets do not have a fixed physical server location
Mapping between the vBuckets and physical servers is
called the cluster map
Document IDs (keys) always get hashed to the same vbucket
Couchbase SDK’s lookup the vbucket -> server mapping
20. Rebalance
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD SHARD
SHARD
6
SHARD
3
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD SHARD
SHARD
7
SHARD
SHARD
6
SHARD
SHARD
8
SHARD
9
SHARD
READ/WRITE/UPDATE
21. Fail over
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD
SHARDSHARD
6
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD
SHARD
SHARD
7
SHARD
SHARD
6
SHARDSHARD
8
SHARD
9
SHARD
SHARD
3
SHARD
1
SHARD
3
SHARD
22. Elastic Scalability
• Linear scalability by adding nodes
• Multi-Dimensional Scalability
(MDS)
• Extremely easy scaling
23. Multi Dimensional Scaling (MDS)
NODE 1 NODE 14
Data Full
Text
AnalyticsGlobal
Index
Query Eventing
Cluster Manager
Managed
Cache
Key-Value
Store
Document
Database Mobile
N1QL
Query
Full Text
Search Analytics
24. Replication
• High availability from node failures
• Disaster recovery from data center
failures with XDCR (Cross Data Center
Replication)
• Supports Active-Active replication
between data centers.
27. Query
• Using SQL for JSON called N1QL
• If you know SQL – N1QL will look
extremely familiar.
• Support for ANSI joins, aggregations,
subqueries, ordering etc.
31. N1QL (Example)
SELECT customers.id,
customers.NAME.lastname,
customers.NAME.firstname
Sum(orderline.amount)
FROM orders UNNEST orders.lineitems AS orderline
JOIN customers ON KEYS orders.custid
WHERE customers.state = 'NY'
GROUP BY customers.id,
customers.NAME.lastname
HAVING sum(orderline.amount) > 10000
ORDER BY sum(orderline.amount) DESC
Dotted sub-document reference
Names are CASE-SENSITIVE
UNNEST to flatten the arrays
JOINS with Document KEY of
customers
32. Query Execution Flow
1. Application submits
N1QL query
2. Query is parsed,
analyzed and plan is
created
1
2
33. Query Execution Flow
3. Query Service makes
request to Index
Service
4. Index Service returns
document keys and
data
3
4
34. Query Execution Flow
5. If Covering Index, skip
step 6
6. If filtering is required,
fetch documents from
Data Service56
35. Query Execution Flow
7. Apply final logic (e.g.
SORT, ORDER BY)
8. Return formatted
results to application
7
8
36. Data Modification Statements
• UPDATE … SET … WHERE …
• DELETE FROM … WHERE …
• INSERT INTO … ( KEY, VALUE ) VALUES …
• INSERT INTO … ( KEY …, VALUE … ) SELECT …
• MERGE INTO … USING … ON …
WHEN [ NOT ] MATCHED THEN …
Note: Couchbase provides per-document atomicity.
37. Data Modification Statements
INSERT INTO ORDERS (KEY, VALUE)
VALUES ("1.ABC.X382", {"O_ID":482, "O_D_ID":3, "O_W_ID":4});
UPDATE ORDERS
SET O_CARRIER_ID = ”ABC987”
WHERE O_ID = 482 AND O_D_ID = 3 AND O_W_ID = 4
DELETE FROM NEW_ORDER
WHERE NO_D_ID = 291 AND
NO_W_ID = 3482 AND
NO_O_ID = 2483
JSON literals can be used in
any expression
39. ● Use cases samples:
○ revenue growth month over month
○ top N sale districts by revenues for
a given week
○ ranking of sales person by region
based on revenue booked
● Answer common but complex business
queries with minimal lines of code and
optimized performance
● Couchbase is the first NoSQL
Database to support ANSI Window
Functions
ANSI Window Functions
With ANSI Window Functions, developers can simplify financial and statistical aggregations in
an easy and optimized way
40. ANSI Common Table Expression (CTE)
● CTE allows developer to isolate
SQL statement into temporary
named result set that can be
referenced as a source table in
the context of a larger query
● Offers the advantages of
readability and ease of
maintenance of complex queries
without compromising
performance.
● Couchbase is the first NoSQL
Database to support ANSI CTE.
With ANSI Common Table Expression, developers have ease of maintenance and better
readability by naming temporary SQL statements
SELECT b.month,
b.current_period_task_count,
ROUND(((b.current_period_task_count - b.last_period_task_count ) /
b.last_period_task_count),2)
FROM last_period_task AS b
last_period_task AS (
SELECT x.month, x.current_period_task_count,
LAG(x.current_period_task_count) OVER (ORDER BY x.month)
AS last_period_task_count
FROM current_period_task x
)
WITH current_period_task AS (
SELECT DATE_TRUNC_STR(a.startDate,'month') AS month,
COUNT(1) AS current_period_task_count
FROM crm a
WHERE a.type='activity' AND a.activityType = 'Task'
AND DATE_PART_STR(a.startDate,'year') = 2018
GROUP BY DATE_TRUNC_STR(a.startDate,'month’)
),
41. User-Defined Functions (Developer Preview)
CREATE FUNCTION getsalestax(state,city)
DROP FUNCTION getsalestax
EXECUTE FUNCTION getsalestax("CA","Santa Clara")
SELECT getsalestax(state,city) FROM invoice
● Allow developers to define custom
functions in Javascript (similar PL/SQL)
callable from N1QL queries. Interactive
debugger to simplify development and
testing
● Server-side logic that can be reused by any
application and micro services; improve
code maintenance and developer
productivity
● Improve code performance by bringing
application logic closer to the data
With User-Defined Functions, developer define callable custom functions to ease development,
simplify reusability and improve code performance.
42. Cost-Based Optimizer (Developer Preview)
Statistics &
metadata
● Cost-based optimizer that generates the optimal
access path based on statistics collected on the
data
● Eliminates time tweaking a query and providing
optimizer hints to get the rule-based optimizer to
pick the right query plan
● Leverages decades of research and experience
in query optimization to collect and use statistics
on JSON, arrays, and objects.
● Couchbase is the first NoSQL database with
dynamic schema to support cost-based optimizer
With Cost-Based Optimizer, queries will run faster by using optimal access path without the
need to provide optimizer hints
43. Index Advisor (Developer Preview)
● The Index Advisor suggests appropriate
indexes to speed up a given query, taking
the guesswork out of query tuning
● The Index Advisor can also monitor and
analyze the statistics collected from
running a workload, and suggest the
indexes that will speed up the queries in
the workload
● The Index Advisor greatly reduces the
complexity and efforts required for
enterprise developers and operations
engineers to determine the right indexes
to speed up their queries
With Index Advisor, developer can create better indexes based on the suggestions and speed
up queries easily.
45. Indexing
• Quick and efficient access to your data
• No need to scan all the documents
• Support for filtered indexes, compounded
indexes and covering indexes.
46. Index Options
Index Type Description
1 Primary Index Index on the document key on the whole bucket
2 Named Primary
Index
Give name for the primary index. Allows multiple primary indexes in the cluster
3 Secondary Index Index on the key-value or document-key
4 Secondary
Composite Index
Index on more than one key-value
5 Functional Index Index on function or expression on key-values
6 Array Index Index individual elements of the arrays
7 Covering Index Query able to answer using the the data from the index and skips retrieving the
item.
8 Adaptive Index Special type of GSI array index that can index all or specified fields of a
document.
9 Replica Index The feature of indexing that allows load balancing. Thus providing scale-out,
multi-dimensional scaling, performance, and high availability.
48. Full Text Search
• Search within texts – extremely fast
• Language aware (supports 19 languages)
• Scoring results mechanism
• Rich querying capabilities
50. Full Text Search - Capabilities
Query
Basic: Match, Match Phrase, Fuzzy, Prefix, Regexp, Wildcard, Boolean Field
Compound: QueryString, Boolean, Conjunction, Disjunction
Range: DateRange, NumericRange
Special Purpose: DocID, MatchAll, MatchNone, Phrase, Term, Geospatial
Scoring (TF/IDF), boosting, field scoping
New/DP: TermRange, Geospatial
Indexing
Real time indexing (inverted index, auto-updated upon mutation)
Default map and map by document type
Dynamic mapping
Stored fields, Term vectors
Analyzers: Tokenization, Token Filtering (stop word removal, stemming – language specific)
Aliasing
51. SEARCH Predicates in Query (Full Text Search)
● Couchbase provides a comprehensive search capability in query beyond the simple LIKE() operator in
most databases
● The SEARCH() operator supports keyword and fuzzy matchings across multiple document fields
● Developers do not have to write complex code to process and combine the results from separate SQL
and search queries
● Better query performance with inverted indexes for search predicates instead of inefficient scans for
LIKE()
SELECT * FROM `beer-sample` b
WHERE SEARCH(b.desc, "fruity”) AND b.abv < 0.5;
With Search predicates, developers can combine SQL and Search queries for powerful
integration and better query performances
52. Mobile
• Full stack platform for mobile and IoT
apps.
• Real time automatic sync – built-in
• Fully secured
• Data can be accessed when offline.
53. Couchbase - The Data Platform for Mobile Engagement
COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER
Lightweight embedded NoSQL database
with full CRUD and
query functionality.
Secure web gateway with
synchronization, data access, and
data integration APIs for accessing,
integrating, and synchronizing data
over the web.
Highly scalable, highly available,
high performance NoSQL
database server.
Client Middle Tier Storage
Security
Built-in enterprise level security throughout the entire stack includes user authentication, user and role based data access control (RBAC), secure
transport (TLS), and 256-bit AES full database encryption.
54. Eventing
• Triggering user defined business logic in real
time
• Runs on the cluster.
• Logic is written in JavaScript (V8)
55. Analytics
• Powerful parallel query processing over
JSON
• Made for long running complex SQL-like
queries
• Complete workload isolation – does not affect
the operational data processing
57. Container and Cloud
Deployment
• Couchbase replication is zone and region
aware –ideal for cloud deployments.
• Pre built modules for all major cloud vendors.
• Support for containers
• Automation with Couchbase Autonomous
Operator for Kubernetes.
• Support for Red Hat OpenShift
60. Distributed ACID Transactions
WHY
With distributed ACID transactions, developers simplify application logic by relying on all-or-
nothing semantics for durably modifying multiple documents distributed on different nodes.
Transition from RDBMS schema:
● De-normalization has limitations
Multi-Asset Coordination:
● Transfer from one user to another
● Reservation items such as flights
Business/Application-level transaction
needs to modify multiple documents (all-or-
nothing):
● Microservices SAGAs - Event based
orchestration
transactions.run((txnctx) -> {
// Insert a document
JsonDocument doc1 =
JsonDocument.create("newDoc",JsonObject.create());
txnctx.insert(bucket, doc1);
// Replace a document
TransactionJsonDocument doc2 =
txnctx.getOrError(bucket, "doc2");
doc2.content().put( "name", "bob");
txnctx.replace(doc2);
// Commit transaction
txnctx.commit();
});
61. ACID in Couchbase
Couchbase Server 6.5 provides strong ACID guarantees while balancing scalability, availability
and performance.
A Atomicity
Guarantees all-or-nothing semantics for updating multiple documents in more
than one shards on different nodes.
C Consistency
Replicas are strongly consistent for chosen durability level. Indexes and
XDCR cluster are eventually consistent.
I Isolation
Read Committed isolation for concurrent readers as it provides strong
semantics without compromising availability, scalability, and performance.
Applications can specify scan consistency level for performance.
D Durability
Data protection under failures: 3 different levels - replicate to majority of the nodes;
replicate to majority and persist to disk on primary; or persist to disk on majority of the
nodes.