This presentation was delivered in the summer of 2013 at mPOS World in Frankfurt, Germany by Scandit CEO Samuel Mueller. The presentation provides an overview of the changing point-of-sale (POS) landscape, and insights into how mobile barcode scanning and data capture technology enables this transition to mobile point of sale (mPOS).
Enterprise-grade mobile barcode scanning with Scandit and XamarinXamarin
Scandit's lightning-fast and accurate Barcode Scanner is a valuable addition to any enterprise application. Watch Zack Gramama, Technical Lead - Xamarin Component Store, and Christian Floerkemeier, CTO and co-founder of Scandit, as they demonstrate how the Scandit component utilizes a unique blurry barcode scan technology that works across platforms to scan any barcode type from any angle.
Democratizing Business Processes with Android-based Mobile DevicesScandit
Presentation was given in April of 2013 at DroidCon in Berlin, Germany. In this presentation, Scandit's COO Christof Roduner goes over Scandit’s products and services, some recent enterprise IT trends, Android technology and challenges, and a variety of usage scenarios where smartphone-based barcode scanning can add value to business processes. Check it out:
Smartphone penetration has surged over recent years, in particular since the launch of the Apple iPhone in 2007 and, even more so, since the start of 2009 when rival devices really started to take hold in the market and smartphone handset prices started to fall to mass market levels - with current penetration levels of smartphones sitting at 60%. Tablets are also becoming increasingly popular with technology lovers. 17.4mn tablets were activated worldwide on Christmas day 2012 – around five were sold every second at one retailer! According to new figures out at the start at January 2013, tablet revenue is quickly catching up to that of smartphones and is expected to overtake them in 2013.
The UK app market is by far the biggest app market in Europe. It outnumbers its European neighbours in terms of revenue generated by app downloads and in-app advertisement as well as the sheer number of downloads and users. With well over 1million apps available from various app stores, we need to look at how they can be used to aid Food & Grocery retailing.
This report will look at:
-Current smartphone and tablet statistics (ownership and usage)
-The app market in the UK and around the world
-How shoppers find new apps
-Apps currently available, and how many shoppers have them on their smartphone/tablet at this time in relation to UK grocery retailers, recipes, vouchers and shopping lists
-See what features shoppers say they would like to have in a grocery shopping app
Shopper Showrooming: Retailer Strategies in a Smartphone World Self-employed
In his Shopper Marketing Asia 2014 talk, our CEO Arthur Policarpio discuss showrooming and reverse showrooming. He also gives strategies on how retailers can respond by integrating mobile in the shopping journey.
Teaching profession: Why have I chosen teaching as professionHina Honey
The document discusses why the presenter chose the teaching profession. It provides definitions of teaching and lists the personal qualities and types of teachers. The top ten reasons for teaching include influencing the future, job security, and summers off. Great thinkers like Aristotle and Einstein emphasized the importance of teaching. The presenter chose teaching because it is a noble profession, allows influencing students, and can reform society. Teaching also provides an opportunity to impart knowledge and earn a halal living.
I'm going to cover something which could be seen as essential for Cassandra but which hasn't gotten much attention in the Cassandra community and literature. It's schema migrations--how you go about pushing out and versioning changes to your keyspace and table definitions across environments. This is an area that has established solutions in the relational database world, with tools like Liquibase(http://www.liquibase.org/) and Flyway (http://flywaydb.org/) and in web frameworks like Rails and Grails.
I'll explain the different types of migrations but then focus, for most of the talk, on schema migrations. I'll explain how schema migrations have been done in the Cassandra community and the roadblocks teams have faced trying to use Liquibase and Flyway to manage Cassandra migrations.
Then I'll share an elegant, lightweight schema migrations system that we at GridPoint built on top of Flyway. I'll use our system as a context for discussing schema migration best practices for Cassandra and the various choices teams have for their migrations and table definitions, including when NOT to use a tool like Flyway. I'll also touch on the other types of migrations besides keyspace and table definitions that can be versioned and driven off source control.
This presentation was delivered in the summer of 2013 at mPOS World in Frankfurt, Germany by Scandit CEO Samuel Mueller. The presentation provides an overview of the changing point-of-sale (POS) landscape, and insights into how mobile barcode scanning and data capture technology enables this transition to mobile point of sale (mPOS).
Enterprise-grade mobile barcode scanning with Scandit and XamarinXamarin
Scandit's lightning-fast and accurate Barcode Scanner is a valuable addition to any enterprise application. Watch Zack Gramama, Technical Lead - Xamarin Component Store, and Christian Floerkemeier, CTO and co-founder of Scandit, as they demonstrate how the Scandit component utilizes a unique blurry barcode scan technology that works across platforms to scan any barcode type from any angle.
Democratizing Business Processes with Android-based Mobile DevicesScandit
Presentation was given in April of 2013 at DroidCon in Berlin, Germany. In this presentation, Scandit's COO Christof Roduner goes over Scandit’s products and services, some recent enterprise IT trends, Android technology and challenges, and a variety of usage scenarios where smartphone-based barcode scanning can add value to business processes. Check it out:
Smartphone penetration has surged over recent years, in particular since the launch of the Apple iPhone in 2007 and, even more so, since the start of 2009 when rival devices really started to take hold in the market and smartphone handset prices started to fall to mass market levels - with current penetration levels of smartphones sitting at 60%. Tablets are also becoming increasingly popular with technology lovers. 17.4mn tablets were activated worldwide on Christmas day 2012 – around five were sold every second at one retailer! According to new figures out at the start at January 2013, tablet revenue is quickly catching up to that of smartphones and is expected to overtake them in 2013.
The UK app market is by far the biggest app market in Europe. It outnumbers its European neighbours in terms of revenue generated by app downloads and in-app advertisement as well as the sheer number of downloads and users. With well over 1million apps available from various app stores, we need to look at how they can be used to aid Food & Grocery retailing.
This report will look at:
-Current smartphone and tablet statistics (ownership and usage)
-The app market in the UK and around the world
-How shoppers find new apps
-Apps currently available, and how many shoppers have them on their smartphone/tablet at this time in relation to UK grocery retailers, recipes, vouchers and shopping lists
-See what features shoppers say they would like to have in a grocery shopping app
Shopper Showrooming: Retailer Strategies in a Smartphone World Self-employed
In his Shopper Marketing Asia 2014 talk, our CEO Arthur Policarpio discuss showrooming and reverse showrooming. He also gives strategies on how retailers can respond by integrating mobile in the shopping journey.
Teaching profession: Why have I chosen teaching as professionHina Honey
The document discusses why the presenter chose the teaching profession. It provides definitions of teaching and lists the personal qualities and types of teachers. The top ten reasons for teaching include influencing the future, job security, and summers off. Great thinkers like Aristotle and Einstein emphasized the importance of teaching. The presenter chose teaching because it is a noble profession, allows influencing students, and can reform society. Teaching also provides an opportunity to impart knowledge and earn a halal living.
I'm going to cover something which could be seen as essential for Cassandra but which hasn't gotten much attention in the Cassandra community and literature. It's schema migrations--how you go about pushing out and versioning changes to your keyspace and table definitions across environments. This is an area that has established solutions in the relational database world, with tools like Liquibase(http://www.liquibase.org/) and Flyway (http://flywaydb.org/) and in web frameworks like Rails and Grails.
I'll explain the different types of migrations but then focus, for most of the talk, on schema migrations. I'll explain how schema migrations have been done in the Cassandra community and the roadblocks teams have faced trying to use Liquibase and Flyway to manage Cassandra migrations.
Then I'll share an elegant, lightweight schema migrations system that we at GridPoint built on top of Flyway. I'll use our system as a context for discussing schema migration best practices for Cassandra and the various choices teams have for their migrations and table definitions, including when NOT to use a tool like Flyway. I'll also touch on the other types of migrations besides keyspace and table definitions that can be versioned and driven off source control.
Cassandra is used as the backend database for Scandit's barcode and product scanning platform. It provides high scalability and availability needed to store large volumes of product data and scan data. Cassandra's data model uses a column family structure and allows storing data flexibly in column names. It is optimized for write-heavy workloads and scales easily by adding more nodes.
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
Cassandra stands out amongst the big data products in its ability to handle optimized writes of large amounts of data while providing configurable fault tolerance and data integrity. Two popular libraries that allow the JVM developer to leverage these capabilities are Hector and the recently open sourced Astyanax. In this talk, Joe presents examples of storing time series data in a Cassandra data store using both of these libraries. There will be code! As an added bonus, a mechanism to unit test using an embedded Cassandra client will be presented.
Code can be downloaded from https://github.com/jmctee/Cassandra-Client-Tutorial
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
Cassandra's data model is more flexible than typically assumed.
Cassandra allows tuning of consistency levels to balance availability and consistency. It can be made consistently when certain replication conditions are met.
Cassandra uses a row-oriented model where rows are uniquely identified by keys and group columns and super columns. Super column families allow grouping columns under a common name and are often used for denormalizing data.
Cassandra's data model is query-based rather than domain-based. It focuses on answering questions through flexible querying rather than storing predefined objects. Design patterns like materialized views and composite keys can help support different types of queries.
This document provides an agenda for a presentation on integrating Apache Cassandra and Apache Spark. The presentation will cover RDBMS vs NoSQL databases, an overview of Cassandra including data model and queries, and Spark including RDDs and running Spark on Cassandra data. Examples will be shown of performing joins between Cassandra and Spark DataFrames for both simple and complex queries.
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
EmoDB is an open source RESTful data store built on top of Cassandra that stores JSON documents and, most notably, offers a databus that allows subscribers to watch for changes to those documents in real time. It features massive non-blocking global writes, asynchronous cross data center communication, and schema-less json content.
For non-blocking global writes, we created a ""JSON delta"" specification that defines incremental updates to any json document. Each row, in Cassandra, is thus a sequence of deltas that serves as a Conflict-free Replicated Datatype (CRDT) for EmoDB's system of record. We introduce the concept of ""distributed compactions"" to frequently compact these deltas for efficient reads.
Finally, the databus forms a crucial piece of our data infrastructure and offers a change queue to real time streaming applications.
About the Speaker
Fahd Siddiqui Lead Software Engineer, Bazaarvoice
Fahd Siddiqui is a Lead Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin, and frequently talks at Austin C* User Group. About Bazaarvoice: Bazaarvoice is a network that connects brands and retailers to the authentic voices of people where they shop. More at www.bazaarvoice.com
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
- Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It was originally developed at Facebook in 2008 and is now an Apache project.
- Cassandra provides high availability with no single point of failure, linear scalability and performance of tens of thousands of queries per second. It is used by many large companies including Netflix, Twitter and eBay.
- Data is organized into tables within keyspaces. Tables must have a primary key which determines how data is partitioned and indexed. Cassandra uses a decentralized architecture with no single point of failure and automatic data distribution across nodes.
This document provides an outline for a lecture on software security. It introduces the lecturer, Roman Oliynykov, and covers various topics related to software vulnerabilities like buffer overflows, heap overflows, integer overflows, and format string vulnerabilities. It provides examples of vulnerable code and exploits, and recommendations for writing more secure code to avoid these vulnerabilities.
Apache Cassandra is an open-source distributed database designed to handle large amounts of data across commodity servers in a highly available manner without single points of failure. It uses a gossip protocol for cluster membership and a Dynamo-inspired architecture to provide availability and partition tolerance, while supporting eventual consistency.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
This document introduces a block-based data fusion language for defining complex event processing queries in smart cities without requiring expertise in CEP languages. It uses processing blocks that can be chained together to represent queries. Templates define custom blocks with free input/output parameters. Wildcard template binding selects data streams at deployment time based on stream metadata or SPARQL constraints. This enables non-experts like city administrators to define CEP queries for large volumes of sensor data in smart cities. Future work includes implementing full SPARQL-based wildcard instantiation and usability testing.
Reviews core networking concepts relevant for the Cloud practitioner. We use AWS as the platform. However the content is generally applicable across clouds.
Note: The instructor-led version of this presentation is at:
https://www.udemy.com/course/primer-for-the-aws-cloud-networking/
The Udemy.com course titled Primer for the AWS Cloud: Networking.
For this upcoming meetup Juan Valencia, Principal Engineer at ShareThis, will be presenting on their real-world use of Apache Cassandra for high throughput and mission critical applications.
This meetup will cover how to set up your projects successfully by having a good data model, running Cassandra, and using the Hector Java client. We will have a Q&A session at the end of Juan's presentation, to ensure everyone's questions are answered.
Hope you can make it!
What You Will Learn at this Meetup:
• Real-World Use Case on ShareThis + Apache Cassandra
• Data Modeling with Apache Cassandra
• Using the Java Hector Client Library with Cassandra
Abstract
Juan Valencia, Principal Engineer at ShareThis, will be presenting on the use of Cassandra for high throughput applications. ShareThis has been running on Cassandra since version 0.6 and currently runs 4 Cassandra clusters, powering batch analytics, real-time analytics, a counter service, and a data lookup service.
This document provides an overview of Cassandra data modeling concepts and techniques. It discusses Cassandra's data model, architecture, data types, consistency levels, and more. Key concepts covered include defining primary keys, including compound primary keys, working with wide rows for time series data, using materialized views, secondary indexes, counters, and time to live for expiring data. The document uses examples to illustrate these Cassandra features and how to apply different data modeling patterns.
MongoDB is a document-oriented, non-relational database that provides an alternative to traditional RDBMS systems. It uses a dynamic schema with flexible document structures and embedded documents. MongoDB has built-in replication for high availability and automatic failover. It also has built-in sharding for horizontal scalability across multiple servers. MongoDB uses JSON-like documents with dynamic schemas, indexing, high performance, and scale horizontally and vertically.
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
The document provides an overview of key concepts in the network layer, including:
- The network layer is responsible for moving data between sending and receiving endpoints by encapsulating transport segments into datagrams.
- The two main functions of the network layer are forwarding, which moves packets through routers, and routing, which determines the path packets take from source to destination.
- IP addresses are 32-bit identifiers assigned to network interfaces that allow endpoints to communicate and routers to forward packets. IP addresses use hierarchy and prefixes to scale routing across large networks like the Internet.
This document provides an introduction to security and cryptography. It begins with an overview of security goals like confidentiality, authenticity, integrity, and non-repudiation. It then discusses symmetric cryptography algorithms like DES and AES, and how they provide confidentiality. Asymmetric cryptography algorithms like RSA and ECC are introduced for providing authentication, non-repudiation through digital signatures, and facilitating key exchange. Hash functions are described for providing integrity and digital signatures. Modes of operation for block ciphers like CBC are covered. Popular algorithms and their application to security goals are summarized.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Cassandra is used as the backend database for Scandit's barcode and product scanning platform. It provides high scalability and availability needed to store large volumes of product data and scan data. Cassandra's data model uses a column family structure and allows storing data flexibly in column names. It is optimized for write-heavy workloads and scales easily by adding more nodes.
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
Cassandra stands out amongst the big data products in its ability to handle optimized writes of large amounts of data while providing configurable fault tolerance and data integrity. Two popular libraries that allow the JVM developer to leverage these capabilities are Hector and the recently open sourced Astyanax. In this talk, Joe presents examples of storing time series data in a Cassandra data store using both of these libraries. There will be code! As an added bonus, a mechanism to unit test using an embedded Cassandra client will be presented.
Code can be downloaded from https://github.com/jmctee/Cassandra-Client-Tutorial
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
Cassandra's data model is more flexible than typically assumed.
Cassandra allows tuning of consistency levels to balance availability and consistency. It can be made consistently when certain replication conditions are met.
Cassandra uses a row-oriented model where rows are uniquely identified by keys and group columns and super columns. Super column families allow grouping columns under a common name and are often used for denormalizing data.
Cassandra's data model is query-based rather than domain-based. It focuses on answering questions through flexible querying rather than storing predefined objects. Design patterns like materialized views and composite keys can help support different types of queries.
This document provides an agenda for a presentation on integrating Apache Cassandra and Apache Spark. The presentation will cover RDBMS vs NoSQL databases, an overview of Cassandra including data model and queries, and Spark including RDDs and running Spark on Cassandra data. Examples will be shown of performing joins between Cassandra and Spark DataFrames for both simple and complex queries.
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
EmoDB is an open source RESTful data store built on top of Cassandra that stores JSON documents and, most notably, offers a databus that allows subscribers to watch for changes to those documents in real time. It features massive non-blocking global writes, asynchronous cross data center communication, and schema-less json content.
For non-blocking global writes, we created a ""JSON delta"" specification that defines incremental updates to any json document. Each row, in Cassandra, is thus a sequence of deltas that serves as a Conflict-free Replicated Datatype (CRDT) for EmoDB's system of record. We introduce the concept of ""distributed compactions"" to frequently compact these deltas for efficient reads.
Finally, the databus forms a crucial piece of our data infrastructure and offers a change queue to real time streaming applications.
About the Speaker
Fahd Siddiqui Lead Software Engineer, Bazaarvoice
Fahd Siddiqui is a Lead Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin, and frequently talks at Austin C* User Group. About Bazaarvoice: Bazaarvoice is a network that connects brands and retailers to the authentic voices of people where they shop. More at www.bazaarvoice.com
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
- Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It was originally developed at Facebook in 2008 and is now an Apache project.
- Cassandra provides high availability with no single point of failure, linear scalability and performance of tens of thousands of queries per second. It is used by many large companies including Netflix, Twitter and eBay.
- Data is organized into tables within keyspaces. Tables must have a primary key which determines how data is partitioned and indexed. Cassandra uses a decentralized architecture with no single point of failure and automatic data distribution across nodes.
This document provides an outline for a lecture on software security. It introduces the lecturer, Roman Oliynykov, and covers various topics related to software vulnerabilities like buffer overflows, heap overflows, integer overflows, and format string vulnerabilities. It provides examples of vulnerable code and exploits, and recommendations for writing more secure code to avoid these vulnerabilities.
Apache Cassandra is an open-source distributed database designed to handle large amounts of data across commodity servers in a highly available manner without single points of failure. It uses a gossip protocol for cluster membership and a Dynamo-inspired architecture to provide availability and partition tolerance, while supporting eventual consistency.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
This document introduces a block-based data fusion language for defining complex event processing queries in smart cities without requiring expertise in CEP languages. It uses processing blocks that can be chained together to represent queries. Templates define custom blocks with free input/output parameters. Wildcard template binding selects data streams at deployment time based on stream metadata or SPARQL constraints. This enables non-experts like city administrators to define CEP queries for large volumes of sensor data in smart cities. Future work includes implementing full SPARQL-based wildcard instantiation and usability testing.
Reviews core networking concepts relevant for the Cloud practitioner. We use AWS as the platform. However the content is generally applicable across clouds.
Note: The instructor-led version of this presentation is at:
https://www.udemy.com/course/primer-for-the-aws-cloud-networking/
The Udemy.com course titled Primer for the AWS Cloud: Networking.
For this upcoming meetup Juan Valencia, Principal Engineer at ShareThis, will be presenting on their real-world use of Apache Cassandra for high throughput and mission critical applications.
This meetup will cover how to set up your projects successfully by having a good data model, running Cassandra, and using the Hector Java client. We will have a Q&A session at the end of Juan's presentation, to ensure everyone's questions are answered.
Hope you can make it!
What You Will Learn at this Meetup:
• Real-World Use Case on ShareThis + Apache Cassandra
• Data Modeling with Apache Cassandra
• Using the Java Hector Client Library with Cassandra
Abstract
Juan Valencia, Principal Engineer at ShareThis, will be presenting on the use of Cassandra for high throughput applications. ShareThis has been running on Cassandra since version 0.6 and currently runs 4 Cassandra clusters, powering batch analytics, real-time analytics, a counter service, and a data lookup service.
This document provides an overview of Cassandra data modeling concepts and techniques. It discusses Cassandra's data model, architecture, data types, consistency levels, and more. Key concepts covered include defining primary keys, including compound primary keys, working with wide rows for time series data, using materialized views, secondary indexes, counters, and time to live for expiring data. The document uses examples to illustrate these Cassandra features and how to apply different data modeling patterns.
MongoDB is a document-oriented, non-relational database that provides an alternative to traditional RDBMS systems. It uses a dynamic schema with flexible document structures and embedded documents. MongoDB has built-in replication for high availability and automatic failover. It also has built-in sharding for horizontal scalability across multiple servers. MongoDB uses JSON-like documents with dynamic schemas, indexing, high performance, and scale horizontally and vertically.
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
The document provides an overview of key concepts in the network layer, including:
- The network layer is responsible for moving data between sending and receiving endpoints by encapsulating transport segments into datagrams.
- The two main functions of the network layer are forwarding, which moves packets through routers, and routing, which determines the path packets take from source to destination.
- IP addresses are 32-bit identifiers assigned to network interfaces that allow endpoints to communicate and routers to forward packets. IP addresses use hierarchy and prefixes to scale routing across large networks like the Internet.
This document provides an introduction to security and cryptography. It begins with an overview of security goals like confidentiality, authenticity, integrity, and non-repudiation. It then discusses symmetric cryptography algorithms like DES and AES, and how they provide confidentiality. Asymmetric cryptography algorithms like RSA and ECC are introduced for providing authentication, non-repudiation through digital signatures, and facilitating key exchange. Hash functions are described for providing integrity and digital signatures. Modes of operation for block ciphers like CBC are covered. Popular algorithms and their application to security goals are summarized.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Programming Foundation Models with DSPy - Meetup Slides
Netcetera
1. Cassandra for Barcodes, Products and Scans:
The Backend Infrastructure at Scandit
@scandit
www.scandit.com February 1, 2012
Christof Roduner
Co-founder and COO
christof@scandit.com
3. 3
WHAT IS SCANDIT?
Scandit provides developers best-in-class tools to
build, analyze and monetize product-centric apps.
ANALYZE
User Interest
MONETIZE
Apps
IDENTIFY
Products
4. 4
IDENTIFY: BARCODE SCANNER
Scandit SDK
Fastest and most reliable barcode scanning technology for camera phones
Available for all major platforms:
iOS
Android
Symbian / Qt
Phonegap
Features:
Scans from any angle
Does not need autofocus
Works with low-end cameras (→ Android, iPad2)
Supports all barcode types (1D, 2D)
6. 6
ANALYZE:
THE SCANALYTICS PLATFORM
Tool for app publishers
App-specific usage statistics
Insights into consumer behavior:
What do users scan?
Product categories? Groceries, electronics, books, cosmetics, …?
Where do users scan?
At home? Or while in a retail store?
Top products and brands
Identify new opportunities:
Customer engagement
Product interest
Cross-selling and up-selling
9. 9
BACKEND REQUIREMENTS
Product database
Many millions of products
Many different data sources
Curation of product data (filtering, etc.)
Analysis of scans
Accept and store high volumes of scans
Generate statistics over extended time periods
Correlate with product data
Provide reports to developers
10. 10
BACKEND DESIGN GOALS
Scalability
High-volume storage
High-volume throughput
Support large number of concurrent client requests (app)
Availability
Low maintenance
11. 11
WHICH DATABASE?
Apache Cassandra
Large, distributed key-value store (DHT)
«NoSQL»
Inspired by:
Amazon’s Dynamo distributed storage system
Google’s BigTable data model
Originally developed at Facebook
Inbox search
12. 12
WHY DID WE CHOOSE IT?
Looked very fast
Even when data is much larger than RAM
Performs well in write-heavy environment
Proven scalability
Without downtime
Tunable replication
Easy to run and maintain
No sharding
All nodes are the same - no coordinators, masters, slaves, …
Data model
YMMV…
13. 13
WHAT YOU HAVE TO GIVE UP
Joins
Referential integrity
Transactions
Expressive query language
Consistency (tunable, but…)
Limited support for:
Schema
Secondary indices
14. 14
CASSANDRA DATA MODEL
Column families
Rows
Columns
(Supercolumns)
We’ll skip them - Cassandra developers don’t like
them
Disclaimer: I tend to say «hash»
when I mean «dictionary, map,
associative array» (Can you tell
my favorite language?)
15. 15
COLUMNS AND ROWS
Column:
Is a name-value pair
Row:
Has exactly one key
Contains any number of columns
Columns are always automatically sorted by their name
Column family:
A collection of any number of rows (!)
Has a name
«Like a table»
16. 16
EXAMPLE COLUMN FAMILY
A column family «users» containing two rows
Columns can be different in every row
First row has a column named «phone», second row does not
Rows can have many columns
You can add millions of them
"users": {
"christof": {
"email": "christof@scandit.com",
"phone": "123-456-7890"
}
"moritz": {
"email": "moritz@scandit.com",
"web": "www.example.com"
}
}
Row with key «christof»
Two columns, automatically
sorted by their names
(«email», «web»)
17. 17
DATA IN COLUMN NAMES
Column names can be used to store data
Frequent pattern in Cassandra
Takes advantage of column sorting
"logins": {
"christof": {
"2012-01-29 16:22:30 +0100": "208.115.113.86",
"2012-01-30 07:48:03 +0100": "66.249.66.183",
"2012-01-30 18:06:55 +0100": "208.115.111.70",
"2012-01-31 12:37:26 +0100": "66.249.66.183"
}
"moritz": {
"2012-01-23 01:12:49 +0100": "205.209.190.116"
}
}
18. 18
SCHEMA AND DATA TYPES
Schema is optional
Data type can be defined for:
Keys
The values of all columns with a given name
The column names in a CF
By default, data type BLOB is used
Data Types
BLOB (default)
ASCII text
UTF8 text
Timestamp
Boolean
UUID
Integer (arbitrary length)
Float
Double
Decimal
19. 19
CLUSTER ORGANIZATION
Node 3
Token 128
Node 2
Token 64
Node 4
Token 192
Node 1
Token 0
Range 1-64,
stored on node 2
Range 65-128,
stored on node 3
20. 20
STORING A ROW
1. Calculate md5 hash for row key
Example: md5(“foobar") = 48
2. Determine data range for hash
Example: 48 lies within range 1-64
3. Store row on node responsible
for range
Example: store on node 2
Node 3
Token 128
Node 2
Token 64
Node 4
Token 192
Node 1
Token 0
Range 1-64,
stored on node 2
Range 65-128,
stored on node 3
21. 21
IMPLICATIONS
Cluster automatically balanced
Load is shared equally between nodes
No hotspots
Scaling out?
Easy
Divide data ranges by adding more nodes
Cluster rebalances itself automatically
Range queries not possible
You can’t retrieve «all rows from A-C»
Rows are not stored in their «natural» order
Rows are stored in order of their md5 hashes
22. 22
IF YOU NEED RANGE QUERIES…
Option 1: «Order Preserving Partitioner» (OPP)
OPP determines node based on a row’s key instead of its hash
Don’t use it…
Manually balancing a cluster is hard
Hotspots
Balancing cluster for one column family creates hotspot for another
Option 2: Use columns instead of rows
Columns are always sorted
Rows can store millions of columns
23. 23
REPLICATION
Tunable replication factor
(RF)
RF > 1: rows are automatically
replicated to next RF-1 nodes
Tunable replication strategy
«Ensure two replicas in
different data centers, racks,
etc.»
Node 3
Token 128
Node 2
Token 64
Node 4
Token 192
Node 1
Token 0
Replica 1
of row
«foobar»
Replica 2
of row
«foobar»
24. 24
CLIENT ACCESS
Clients can send read and write
requests to any node
This node will act as
coordinator
Coordinator forwards request
to nodes where data resides
Node 3
Token 128
Node 2
Token 64
Node 4
Token 192
Node 1
Token 0
Client
Request:
insert(
"foobar": { "email": "fb@example.com" }
)
Replica 2
of row
«foobar»
Replica 1
of row
«foobar»
25. 25
CONSISTENCY LEVELS
For all requests, clients can set a consistency level (CL)
For writes:
CL defines how many replicas must be written before
«success» is returned to client
For reads:
CL defines how many replicas must respond before result is
returned to client
Consistency levels:
ONE
QUORUM
ALL
… (data center-aware levels)
26. 26
INCONSISTENT DATA
Example scenario:
Replication factor 2
Two existing replica for row «foobar»
Client overwrites existing columns in «foobar»
Replica 2 is down
What happens:
Column is updated in replica 1, but not replica 2 (even with CL=ALL !)
Timestamps to the rescue
Every column has a timestamp
Timestamps are supplied by clients
Upon read, column with latest timestamp wins
→Use NTP
28. 28
RETRIEVING DATA (API)
At a row level, you can…
Get all rows
Get a single row by specifying its key
Get a number of rows by specifying their keys
Get a range of rows
Only with OPP, strongly discouraged
At a column level, you can…
Get all columns
Get a single column by specifying its name
Get a number of columns by specifying their names
Get a range of columns by specifying the name of the first and
last column
Again: no ranges of rows
32. 32
SECONDARY INDICES
Secondary indices can be defined for (single) columns
Secondary indices only support equality predicate (=)
in queries
Each node maintains index for data it owns
When indexed column is queried, request must be forwarded
to all nodes
Sometimes better to manually maintain your own index
33. 33
PRODUCTION EXPERIENCE
No stability issues
Very fast
Language bindings don’t have the same quality
Out of sync, bugs
Data model is a mental twist
Design-time decisions sometimes hard to change
Rudimentary access control
34. 34
TRYING OUT CASSANDRA
DataStax website
Company founded by Cassandra developers
Provides
Documentation
Amazon Machine Image
Apache website
Mailing lists
35. 35
CLUSTER AT SCANDIT
Several nodes in two data centers
Linux machines
Identical setup on every node
Allows for easy failover
36. 36
NODE ARCHITECTURE
Website & REST API
Ruby on Rails, Rack
to other nodes
frommobileappsandwebbrowsers
Phusion Passenger
mod_passenger
ETH Zurichspin-offcompanyFoundedbythreeformerPhDstudentsfrom ETH Zurichand MITMission: Provide mobile appdeveloperswithtoolstobuild…Atthecenterofourbusiness:Barcode scanningalgorithmdevelopedat ETH ZurichSDKHow is it different from Zxing, Zbar, etc.?All platformsLow-end AndroidphonesiPad2Faster (beforeautofocustriggers)Dynamic range (handlesclosecodeswell)