This document provides an overview of NoSQL in Action, including summaries of Amazon's Dynamo and MongoDB. It discusses how Dynamo was designed for Amazon's infrastructure and business needs, including high availability even during failures. It also summarizes that MongoDB is a schemaless document database that allows hierarchical namespaces and stores data in BSON format. The document outlines several key aspects of NoSQL databases.
Vote Early, Vote Often: From Napkin to Canvassing Application in a Single Wee...Jim Czuprynski
The frenetic pace of application development in modern IT organizations means it’s not unusual to demand an application be built with minimal requirement gathering – literally, from a napkin-based sketch – to a working first draft of the app within extremely short time frames – even a weekend! – with production deployment to follow just a few days later.
I'll demonstrate a real-life application development scenario – the creation of a mobile application that gives election canvassers a tool to identify, classify, and inform voters in a huge suburban Chicago voting district – using the latest Oracle application development UI, data modeling tools, and database technology. Along the way, we’ll show how Oracle APEX makes short work of building a working application while the Oracle DBA leverages her newest tools – SQL Developer and Data Modeler – to build a secure, reliable, scalable application for her development team.
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...Jim Czuprynski
There’s an onslaught of Big Data coming to our IT shops - zettabytes of it! – but instead of your application developers struggling to learn new languages and techniques to analyze it, why not leverage Oracle Database 18c?
I'll demonstrate how to tackle handling the coming Big Data tidal wave with the best tool ever designed to filter, sort, aggregate, and report information: Structured Query Language. We’ll also take a closer look at using some new Analytic Functions in 19c to make short work of complex analyses and how best to leverage 18c’s latest Database In-Memory features for External Tables. And we’ll even explore how easy it is to leverage External Tables in Autonomous Data Warehouse using the latest features of DBMS_CLOUD.
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
Autonomous Transaction Processing (ATP) - the second in the family of Oracle’s Autonomous Databases – offers Oracle DBAs the ability to apply a force multiplier for their OLTP database application workloads. However, it’s important to understand both the benefits and limitations of ATP before migrating any workloads to that environment. I'll offer a quick but deep dive into how best to take advantage of ATP - including how to load data quickly into the underlying database – and some ideas on how ATP will impact the role of Oracle DBA in the immediate future. (Hint: Think automatic transmission instead of stick-shift.)
Getting Started with MongoDB Using the Microsoft Stack MongoDB
Speaker: John Randolph, Sr. Software Developer, Gexa Energy
Level: 100 (Beginner)
Track: Developer
Gexa has implemented several applications using MongoDB as a document repository storing multiple types of files (PDF, XLS, CSV, etc.). This entry level session is intended to share what we’ve learned in developing and deploying our first applications in an on premise, Microsoft environment. We’ll provide architectural and development information about what we’ve done. The focus is to help get your projects up-to-speed more quickly. This will be useful to teams moving from pilot to production and for developers getting started with the .Net MongoDB drivers. Plenty of code samples will be shown. We’ll discuss our successful engagement with MongoDB Consulting to help us design and deploy a high-quality production environment.
What You Will Learn:
- Ideas how to store and retrieve documents of different sizes, types, and volumes. We’ll describe the storage, partitioning and indexing techniques used that provide sub-second retrieval from collections with over 100 million records.
- The issues addressed moving to production, including: backup, disaster recovery, SSL, using replica sets, implementing authorization and authentication, changing default setting, and creating a full path-to-production set of environments.
- A successful pattern for building applications with .Net, providing teams some ideas to jump-start their development along with tips and tricks for using the .Net drivers.
The document discusses different types of NoSQL databases including key-value stores like Memcached and Redis, document databases like Couchbase and MongoDB, column-oriented databases like Cassandra, and graph databases like Neo4j. It explains the basic data models and architectures of each type of NoSQL database. NoSQL databases provide more flexible schemas and better horizontal scalability than traditional relational databases.
This document discusses Delta Change Data Feed (CDF), which allows capturing changes made to Delta tables. It describes how CDF works by storing change events like inserts, updates and deletes. It also outlines how CDF can be used to improve ETL pipelines, unify batch and streaming workflows, and meet regulatory needs. The document provides examples of enabling CDF, querying change data and storing the change events. It concludes by offering a demo of CDF in Jupyter notebooks.
Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Speaker: Isabel Peters, Software Engineer, MongoDB
Track: WTC Lounge
Data backup is a critical process to keep your data safe and recoverable in case of an unexpected local storage failure. At MongoDB, we develop tools to easily backup your data, keep it safe and restore it so that you don’t have to worry or spend time thinking about the process, allowing you to focus on your various other responsibilities. Come discover what the architecture of a backup system looks like.
Vote Early, Vote Often: From Napkin to Canvassing Application in a Single Wee...Jim Czuprynski
The frenetic pace of application development in modern IT organizations means it’s not unusual to demand an application be built with minimal requirement gathering – literally, from a napkin-based sketch – to a working first draft of the app within extremely short time frames – even a weekend! – with production deployment to follow just a few days later.
I'll demonstrate a real-life application development scenario – the creation of a mobile application that gives election canvassers a tool to identify, classify, and inform voters in a huge suburban Chicago voting district – using the latest Oracle application development UI, data modeling tools, and database technology. Along the way, we’ll show how Oracle APEX makes short work of building a working application while the Oracle DBA leverages her newest tools – SQL Developer and Data Modeler – to build a secure, reliable, scalable application for her development team.
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...Jim Czuprynski
There’s an onslaught of Big Data coming to our IT shops - zettabytes of it! – but instead of your application developers struggling to learn new languages and techniques to analyze it, why not leverage Oracle Database 18c?
I'll demonstrate how to tackle handling the coming Big Data tidal wave with the best tool ever designed to filter, sort, aggregate, and report information: Structured Query Language. We’ll also take a closer look at using some new Analytic Functions in 19c to make short work of complex analyses and how best to leverage 18c’s latest Database In-Memory features for External Tables. And we’ll even explore how easy it is to leverage External Tables in Autonomous Data Warehouse using the latest features of DBMS_CLOUD.
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
Autonomous Transaction Processing (ATP) - the second in the family of Oracle’s Autonomous Databases – offers Oracle DBAs the ability to apply a force multiplier for their OLTP database application workloads. However, it’s important to understand both the benefits and limitations of ATP before migrating any workloads to that environment. I'll offer a quick but deep dive into how best to take advantage of ATP - including how to load data quickly into the underlying database – and some ideas on how ATP will impact the role of Oracle DBA in the immediate future. (Hint: Think automatic transmission instead of stick-shift.)
Getting Started with MongoDB Using the Microsoft Stack MongoDB
Speaker: John Randolph, Sr. Software Developer, Gexa Energy
Level: 100 (Beginner)
Track: Developer
Gexa has implemented several applications using MongoDB as a document repository storing multiple types of files (PDF, XLS, CSV, etc.). This entry level session is intended to share what we’ve learned in developing and deploying our first applications in an on premise, Microsoft environment. We’ll provide architectural and development information about what we’ve done. The focus is to help get your projects up-to-speed more quickly. This will be useful to teams moving from pilot to production and for developers getting started with the .Net MongoDB drivers. Plenty of code samples will be shown. We’ll discuss our successful engagement with MongoDB Consulting to help us design and deploy a high-quality production environment.
What You Will Learn:
- Ideas how to store and retrieve documents of different sizes, types, and volumes. We’ll describe the storage, partitioning and indexing techniques used that provide sub-second retrieval from collections with over 100 million records.
- The issues addressed moving to production, including: backup, disaster recovery, SSL, using replica sets, implementing authorization and authentication, changing default setting, and creating a full path-to-production set of environments.
- A successful pattern for building applications with .Net, providing teams some ideas to jump-start their development along with tips and tricks for using the .Net drivers.
The document discusses different types of NoSQL databases including key-value stores like Memcached and Redis, document databases like Couchbase and MongoDB, column-oriented databases like Cassandra, and graph databases like Neo4j. It explains the basic data models and architectures of each type of NoSQL database. NoSQL databases provide more flexible schemas and better horizontal scalability than traditional relational databases.
This document discusses Delta Change Data Feed (CDF), which allows capturing changes made to Delta tables. It describes how CDF works by storing change events like inserts, updates and deletes. It also outlines how CDF can be used to improve ETL pipelines, unify batch and streaming workflows, and meet regulatory needs. The document provides examples of enabling CDF, querying change data and storing the change events. It concludes by offering a demo of CDF in Jupyter notebooks.
Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Speaker: Isabel Peters, Software Engineer, MongoDB
Track: WTC Lounge
Data backup is a critical process to keep your data safe and recoverable in case of an unexpected local storage failure. At MongoDB, we develop tools to easily backup your data, keep it safe and restore it so that you don’t have to worry or spend time thinking about the process, allowing you to focus on your various other responsibilities. Come discover what the architecture of a backup system looks like.
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Codemotion
In spring 2016 the first press reports regarding the "panama papers" were released. With almost 3TB of raw data this was by far the largest leak of data worldwide. This talk gives some technical insights who the ICIJ (International Consortium Of Investigate Journalists) worked with that amount of data to provide journalist an easy to use interface for doing their research. Aside other technologies one core component was a graph database. In a live demo in the panama papers dataset we'll explore to power and conciseness of the graph query language "Cypher".
This document summarizes a presentation on the Elastic Stack. It discusses the main components - Elasticsearch for storing and searching data, Logstash for ingesting data, Kibana for visualizing data. It provides examples of using Elasticsearch for search, analytics, and aggregations. It also briefly mentions new features across the Elastic Stack like update by query, ingest nodes, pipeline improvements, and APIs for management and metrics.
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Databricks
This session presents a simple, human-based approach to create test suites targeting multiple points of contact in a data solution. Commonly, an enterprise will pick a data processing solution with heavy GUIs because it can make an easy to understand workflow around data. However, those solutions still are not able to verify the simplest use case, i.e. “If I put data into a solution to process data, then I should get a desired result.”
FIS will demonstrate and teach you how to build a unique testing solution on top of Apache Spark. Under its solution, FIS can actually prove to users in their organization that when they put data in, they get the correct result out. They can also enlist their entire team from product owner to developer to write complete unit tests. The type of flexibility Spark enables allows you to take unique paths in building robust, understandable data flows. The transformational element is the ability to do this in milliseconds, and not wait till the entire pipeline finishes.
The document provides an overview of different NoSQL database types, including key-value stores, document databases, column-oriented databases, graph databases, and caches. It discusses examples of databases for each type and common use cases. The document also covers querying graph databases, polyglot persistence using multiple database types, and concludes with when each database type is best suited and when not to use a NoSQL database.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
1. The document discusses using MongoDB and data lakes for enterprise data management. It outlines the current issues with relational databases and how MongoDB addresses challenges like flexibility, scalability and performance.
2. Various architectures for enterprise data management with MongoDB are presented, including using it for raw, transformed and aggregated data stores.
3. The benefits of combining MongoDB and Hadoop in a data lake are greater agility, insight from handling different data structures, scalability and low latency for real-time decisions.
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
This document summarizes a typical day for a Druid architect. It describes common tasks like evaluating production clusters, analyzing data and queries, and recommending optimizations. The architect asks stakeholders questions to understand usage and helps evaluate if Druid is a good fit. When advising on Druid, the architect considers factors like data sources, query types, and technology stacks. The document also provides tips on configuring clusters for performance and controlling segment size.
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
The document discusses improving data quality in a data lake. It describes three levels (L1-L3) of data lake maturity:
L1 involves storing data in an object store in a basic format like CSV files. This provides good performance, cost efficiency, and developer experience.
L2 adds optimized table formats like Delta Lake, Hudi and Iceberg that maintain metadata and transaction logs to enable features like schema enforcement, data versioning and isolation.
L3 adds data version control systems like lakeFS that extend the object store with Git-like source control operations. This allows instantly reverting bad data, developing data in isolation, and simplifying data reproducibility. LakeFS was demonstrated as an example solution
Introduction to SQL++ for Big Data: Same Language, More PowerAll Things Open
SQL++ is a query language that extends SQL to enable analytics on NoSQL data stored in JSON documents. It allows SQL queries to be run directly on JSON data without requiring an ETL process to move data into a relational database first. SQL++ supports features like querying nested objects and arrays in documents as well as aggregation functions. Several database systems like Couchbase, AsterixDB, and Apache Drill support SQL++.
This document discusses building reactive database drivers with R2DBC. It covers how Spring supports reactive database access, an introduction to R2DBC including its design principles and SPI. It then discusses how R2DBC drivers work internally and provides examples of using R2DBC with Spring Data repositories. While R2DBC brings reactivity to database access, it is still immature and has limitations compared to blocking approaches like JPA.
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...Databricks
The CERN experiments and their particle accelerator, the Large Hadron Collider (LHC), will soon have collected a total of one exabyte of data. Moreover, the next upgrade of the accelerator, the high-luminosity LHC, will dramatically increase the rate of particle collisions, thus boosting the potential for discoveries but also generating unprecedented data challenges.
In order to process and analyse all those data, CERN is investigating complementary ways to the traditional approaches, which mainly rely on Grid and batch jobs for data reconstruction, calibration and skimming combined with a phase of local analysis of reduced data. The new techniques should allow for interactive analysis on much bigger datasets by transparently exploiting dynamically pluggable resources.
In that sense, Spark is being used at CERN to process large physics datasets in a distributed fashion. The most widely used tool for high-energy physics analysis, ROOT, implements a layer on top of Spark in order to distribute computations across a cluster of machines. This makes it possible for physics analysis written in either C++ or Python to be parallelised on Spark clusters, while reading the input data from CERN’s mass storage system: EOS. On the other hand, another important use case of Spark at CERN has recently emerged.
The LHC logging service, which collects data from the accelerator to get information on how to improve the performance of the machine, is currently migrating its architecture to leverage Spark for its analytics workflows. This talk will discuss the unique challenges of the aforementioned use cases and how SWAN, the CERN service for interactive web-based analysis, now supports them thanks to a new feature: the possibility for users to dynamically plug Spark clusters into their sessions in order to offload computations to those resources.
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
This document outlines Viki's analytics infrastructure, including data collection, storage, processing, and visualization. It discusses collecting behavioral data from various sources and storing it in Hadoop. Data is centralized, cleaned, transformed, and loaded into a PostgreSQL data warehouse for analysis. Real-time data is processed using Apache Storm and visualized on dashboards and alerts. Technologies used include Ruby, Python, Java, Hadoop, Hive, and Amazon Redshift for analytics and PostgreSQL, MongoDB, and Redis for transactional data.
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Lucidworks
This document summarizes a presentation about Microsoft's Log Analytics SaaS service, which uses Apache Solr. It discusses the challenges of supporting a multi-tenant service at scale, including bottlenecks in Solr Cloud and performance issues with wide queries. The presentation describes Microsoft's approach to addressing these challenges through workload management across Solr clusters, centralized configuration, and querying cold storage clusters to improve query performance. It concludes by discussing next steps to further optimize Solr for the log analytics scenario.
This document discusses different NoSQL database technologies for various application requirements. It describes graph databases like Neo4j, document databases like MongoDB, and column family databases like Cassandra. It then provides examples of using each for a blog system, Twitter clone, and social network. Graph databases are well-suited for the social network due to focusing on entity relationships. Document databases work well for the blog by embedding comments in blog posts. A column family database is a good fit for the Twitter clone to handle high write loads through denormalization across column families.
Flink Community Update December 2015: Year in ReviewRobert Metzger
This document summarizes the Berlin Apache Flink Meetup #12 that took place in December 2015. It discusses the key releases and improvements to Flink in 2015, including the release of versions 0.10.0 and 0.10.1, and new features that were added to the master branch, such as improvements to the Kafka connector. It also lists pending pull requests, recommended reading, and provides statistics on Flink's growth in 2015 in terms of GitHub activity, meetup groups, organizations at Flink Forward, and articles published.
This presentation held in at Inovex GmbH in Munich in November 2015 was about a general introduction of the streaming space, an overview of Flink and use cases of production users as presented at Flink Forward.
Streaming ETL to Elastic with Apache Kafka and KSQLconfluent
Companies are recognizing the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, enableing low latency analytics, event-driven architectures and the population of multiple downstream systems. These data pipelines can be built using configuration alone.
In this talk we’ll see how easy it is to stream data from sources such as databases and into Kafka using the Kafka Connect API. We’ll use KSQL to filter, aggregate and join it to other data, and then stream this enriched data from Kafka out into targets such as Elasticsearch. All of this can be accomplished without a single line of code!
NoSQL databases are currently used in several applications scenarios in contrast to Relations Databases. Several type of Databases there exist. In this presentation we compare Key Value, Column Oriented, Document Oriented and Graph Databases. Using a simple case study there are evaluated pros and cons of the NoSQL databases taken into account.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
The document discusses query mechanisms for NoSQL databases. It begins by describing how relational databases require normalization of data into tables and use SQL for queries. NoSQL databases are introduced as being non-relational, schema-free, and having simple APIs. Document stores are highlighted as a type of NoSQL database that can natively store hierarchical data without normalization. Specific document stores like CouchDB and MongoDB are described, showing how data can be stored and queried in documents through HTTP requests or a mongo client. Map-reduce functions are also discussed as a way to perform complex queries across collections of documents.
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Codemotion
In spring 2016 the first press reports regarding the "panama papers" were released. With almost 3TB of raw data this was by far the largest leak of data worldwide. This talk gives some technical insights who the ICIJ (International Consortium Of Investigate Journalists) worked with that amount of data to provide journalist an easy to use interface for doing their research. Aside other technologies one core component was a graph database. In a live demo in the panama papers dataset we'll explore to power and conciseness of the graph query language "Cypher".
This document summarizes a presentation on the Elastic Stack. It discusses the main components - Elasticsearch for storing and searching data, Logstash for ingesting data, Kibana for visualizing data. It provides examples of using Elasticsearch for search, analytics, and aggregations. It also briefly mentions new features across the Elastic Stack like update by query, ingest nodes, pipeline improvements, and APIs for management and metrics.
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Databricks
This session presents a simple, human-based approach to create test suites targeting multiple points of contact in a data solution. Commonly, an enterprise will pick a data processing solution with heavy GUIs because it can make an easy to understand workflow around data. However, those solutions still are not able to verify the simplest use case, i.e. “If I put data into a solution to process data, then I should get a desired result.”
FIS will demonstrate and teach you how to build a unique testing solution on top of Apache Spark. Under its solution, FIS can actually prove to users in their organization that when they put data in, they get the correct result out. They can also enlist their entire team from product owner to developer to write complete unit tests. The type of flexibility Spark enables allows you to take unique paths in building robust, understandable data flows. The transformational element is the ability to do this in milliseconds, and not wait till the entire pipeline finishes.
The document provides an overview of different NoSQL database types, including key-value stores, document databases, column-oriented databases, graph databases, and caches. It discusses examples of databases for each type and common use cases. The document also covers querying graph databases, polyglot persistence using multiple database types, and concludes with when each database type is best suited and when not to use a NoSQL database.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
1. The document discusses using MongoDB and data lakes for enterprise data management. It outlines the current issues with relational databases and how MongoDB addresses challenges like flexibility, scalability and performance.
2. Various architectures for enterprise data management with MongoDB are presented, including using it for raw, transformed and aggregated data stores.
3. The benefits of combining MongoDB and Hadoop in a data lake are greater agility, insight from handling different data structures, scalability and low latency for real-time decisions.
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
This document summarizes a typical day for a Druid architect. It describes common tasks like evaluating production clusters, analyzing data and queries, and recommending optimizations. The architect asks stakeholders questions to understand usage and helps evaluate if Druid is a good fit. When advising on Druid, the architect considers factors like data sources, query types, and technology stacks. The document also provides tips on configuring clusters for performance and controlling segment size.
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
The document discusses improving data quality in a data lake. It describes three levels (L1-L3) of data lake maturity:
L1 involves storing data in an object store in a basic format like CSV files. This provides good performance, cost efficiency, and developer experience.
L2 adds optimized table formats like Delta Lake, Hudi and Iceberg that maintain metadata and transaction logs to enable features like schema enforcement, data versioning and isolation.
L3 adds data version control systems like lakeFS that extend the object store with Git-like source control operations. This allows instantly reverting bad data, developing data in isolation, and simplifying data reproducibility. LakeFS was demonstrated as an example solution
Introduction to SQL++ for Big Data: Same Language, More PowerAll Things Open
SQL++ is a query language that extends SQL to enable analytics on NoSQL data stored in JSON documents. It allows SQL queries to be run directly on JSON data without requiring an ETL process to move data into a relational database first. SQL++ supports features like querying nested objects and arrays in documents as well as aggregation functions. Several database systems like Couchbase, AsterixDB, and Apache Drill support SQL++.
This document discusses building reactive database drivers with R2DBC. It covers how Spring supports reactive database access, an introduction to R2DBC including its design principles and SPI. It then discusses how R2DBC drivers work internally and provides examples of using R2DBC with Spring Data repositories. While R2DBC brings reactivity to database access, it is still immature and has limitations compared to blocking approaches like JPA.
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...Databricks
The CERN experiments and their particle accelerator, the Large Hadron Collider (LHC), will soon have collected a total of one exabyte of data. Moreover, the next upgrade of the accelerator, the high-luminosity LHC, will dramatically increase the rate of particle collisions, thus boosting the potential for discoveries but also generating unprecedented data challenges.
In order to process and analyse all those data, CERN is investigating complementary ways to the traditional approaches, which mainly rely on Grid and batch jobs for data reconstruction, calibration and skimming combined with a phase of local analysis of reduced data. The new techniques should allow for interactive analysis on much bigger datasets by transparently exploiting dynamically pluggable resources.
In that sense, Spark is being used at CERN to process large physics datasets in a distributed fashion. The most widely used tool for high-energy physics analysis, ROOT, implements a layer on top of Spark in order to distribute computations across a cluster of machines. This makes it possible for physics analysis written in either C++ or Python to be parallelised on Spark clusters, while reading the input data from CERN’s mass storage system: EOS. On the other hand, another important use case of Spark at CERN has recently emerged.
The LHC logging service, which collects data from the accelerator to get information on how to improve the performance of the machine, is currently migrating its architecture to leverage Spark for its analytics workflows. This talk will discuss the unique challenges of the aforementioned use cases and how SWAN, the CERN service for interactive web-based analysis, now supports them thanks to a new feature: the possibility for users to dynamically plug Spark clusters into their sessions in order to offload computations to those resources.
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
This document outlines Viki's analytics infrastructure, including data collection, storage, processing, and visualization. It discusses collecting behavioral data from various sources and storing it in Hadoop. Data is centralized, cleaned, transformed, and loaded into a PostgreSQL data warehouse for analysis. Real-time data is processed using Apache Storm and visualized on dashboards and alerts. Technologies used include Ruby, Python, Java, Hadoop, Hive, and Amazon Redshift for analytics and PostgreSQL, MongoDB, and Redis for transactional data.
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Lucidworks
This document summarizes a presentation about Microsoft's Log Analytics SaaS service, which uses Apache Solr. It discusses the challenges of supporting a multi-tenant service at scale, including bottlenecks in Solr Cloud and performance issues with wide queries. The presentation describes Microsoft's approach to addressing these challenges through workload management across Solr clusters, centralized configuration, and querying cold storage clusters to improve query performance. It concludes by discussing next steps to further optimize Solr for the log analytics scenario.
This document discusses different NoSQL database technologies for various application requirements. It describes graph databases like Neo4j, document databases like MongoDB, and column family databases like Cassandra. It then provides examples of using each for a blog system, Twitter clone, and social network. Graph databases are well-suited for the social network due to focusing on entity relationships. Document databases work well for the blog by embedding comments in blog posts. A column family database is a good fit for the Twitter clone to handle high write loads through denormalization across column families.
Flink Community Update December 2015: Year in ReviewRobert Metzger
This document summarizes the Berlin Apache Flink Meetup #12 that took place in December 2015. It discusses the key releases and improvements to Flink in 2015, including the release of versions 0.10.0 and 0.10.1, and new features that were added to the master branch, such as improvements to the Kafka connector. It also lists pending pull requests, recommended reading, and provides statistics on Flink's growth in 2015 in terms of GitHub activity, meetup groups, organizations at Flink Forward, and articles published.
This presentation held in at Inovex GmbH in Munich in November 2015 was about a general introduction of the streaming space, an overview of Flink and use cases of production users as presented at Flink Forward.
Streaming ETL to Elastic with Apache Kafka and KSQLconfluent
Companies are recognizing the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, enableing low latency analytics, event-driven architectures and the population of multiple downstream systems. These data pipelines can be built using configuration alone.
In this talk we’ll see how easy it is to stream data from sources such as databases and into Kafka using the Kafka Connect API. We’ll use KSQL to filter, aggregate and join it to other data, and then stream this enriched data from Kafka out into targets such as Elasticsearch. All of this can be accomplished without a single line of code!
NoSQL databases are currently used in several applications scenarios in contrast to Relations Databases. Several type of Databases there exist. In this presentation we compare Key Value, Column Oriented, Document Oriented and Graph Databases. Using a simple case study there are evaluated pros and cons of the NoSQL databases taken into account.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
The document discusses query mechanisms for NoSQL databases. It begins by describing how relational databases require normalization of data into tables and use SQL for queries. NoSQL databases are introduced as being non-relational, schema-free, and having simple APIs. Document stores are highlighted as a type of NoSQL database that can natively store hierarchical data without normalization. Specific document stores like CouchDB and MongoDB are described, showing how data can be stored and queried in documents through HTTP requests or a mongo client. Map-reduce functions are also discussed as a way to perform complex queries across collections of documents.
This is Just an overview how to present those slides which Describes Software Working....
its a General way of Representation....
Don't worry About Forms Shown inside...
The document describes a project for a hospital management system. The project was submitted to fulfill degree requirements and automate operations for a small hospital. It includes developing databases to store information on patients, doctors, staff, diagnoses, and bills. Entity relationship diagrams and tables were designed for the logical and physical database structures. The system allows admission of patients, storing their details and appointments, doctor consultations, prescriptions, and billing. It aims to computerize a hospital's operations and provide effective storage and reports on patient information.
The document describes a proposed hospital management system (HMS) that aims to automate and standardize a hospital's management processes. Currently, hospitals rely on manual paper-based systems that are inefficient and prone to errors. The HMS would control key information like patient data, schedules, and invoices electronically. It would make hospital management more efficient and reduce errors by standardizing data and ensuring integrity across information systems. The system design involves modules for registration, pharmacy, doctors, reception, laboratory, and discharge summaries. The technical requirements specify technologies like ASP.NET, C#, and SQL Server for development. UML diagrams including use cases, sequences, and classes are used for design. Data flow diagrams and entity-relationship diagrams model the
This document provides an overview and requirements for developing a Hospital Management System. It describes collecting both primary and secondary data. Key objectives of the system are to computerize patient and hospital details, schedule appointments and services, update medical store inventory, handle test reports, and keep patient information up-to-date. The system will have modules for login, patients, doctors, billing, and generating reports. It will use a relational database with tables for patient, doctor, room, and bill details.
The document discusses different cloud computing stacks, including CloudStack and OpenStack. It provides details on the components and features of each stack. CloudStack is presented as a console for managing data center resources like virtual machines, networking, and storage. It enables IaaS capabilities. OpenStack is described as an open source software for building public and private clouds, with components that manage compute, storage, networking, identity, and dashboards. It supports multiple hypervisors and is used by many large companies.
The document discusses the Google Filesystem (GFS). It was designed by Google to meet its massive storage needs. The GFS architecture consists of a single master node that manages metadata, and multiple chunkservers that store file data sliced into fixed-size chunks. Each chunk is replicated on multiple servers for reliability. The master handles tasks like chunk leasing, migration, and garbage collection.
Topic 8: Enhancements and Alternative ArchitecturesZubair Nabi
The document discusses several enhancements and alternative architectures to MapReduce, including Pig Latin, Dryad, CIEL, and Naiad. It provides an overview of each system, highlighting their improvements over MapReduce such as supporting more complex dataflow graphs beyond the two-stage map and reduce model. Pig Latin is described as a declarative language that compiles to physical plans executed on Hadoop, while Dryad generalizes MapReduce to allow arbitrary directed acyclic graphs.
The document provides an introduction and overview of MongoDB, including what NoSQL is, the different types of NoSQL databases, when to use MongoDB, its key features like scalability and flexibility, how to install and use basic commands like creating databases and collections, and references for further learning.
The Power of Relationships in Your Big DataPaulo Fagundes
The document provides an overview of Oracle NoSQL Database Release 3.0, including new features such as table data modeling, secondary indexing, data centers for disaster recovery, and security enhancements. Best practices are discussed for choosing a data model, using indexes, and configuring data centers and zones.
The document provides an overview of NoSQL databases. It discusses relational database systems and SQL, and then poses questions about what, why, and when NoSQL databases are used. It outlines some key advantages and disadvantages of NoSQL databases, and categories including document stores, key-value stores, column family stores, and graph databases. Some current applications are highlighted, along with distinguishing characteristics of NoSQL databases compared to relational databases. Finally, the CAP theorem is introduced as an important concept regarding consistency, availability, and partition tolerance in distributed systems.
The document outlines Oracle's general product direction for its database products. It discusses initiatives around database as a service, big data, and cloud computing. It provides a brief look back at Oracle Database 12c releases in 2013 and previews what is coming next in 2014, including Oracle Database 12c on new platforms and the introduction of a new backup and recovery appliance. The document also discusses a focus on database as a service using Oracle tools and Exadata and provides an update on testing and feedback for Oracle Database In-Memory.
Presentation given by Akmal Chaudhri (Hortonworks) to the BCS Data Management Specialist Group on 24th October 2013.
The presentation provides a balanced view of the state of NoSQL technology and tools and options for selection on projects.
A video of the presentation is available on YouTube at https://www.youtube.com/watch?v=FYfJ8C_YcvI
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...Ludovico Caldara
This document describes a new solution implemented by Trivadis to address a customer's need to clone databases faster. The previous solution took 2 hours to clone a 300GB database. The new solution leverages Oracle Data Guard, NVM-e, ACFS snapshots, bash scripts, Linux, and Windows with Perl to enable cloning a database within minutes. Key aspects of the new architecture include using ACFS snapshots to quickly copy data, placing components like GRID infrastructure and databases on high-performance NVM-e storage, and automating the cloning process with scripts. This provides faster database clones while avoiding costly additional technologies.
Ability to define data targets in CloverDX Data Catalog and Wrangler to allow you to connect and write your data to any system.
New mapping mode in Wrangler will help you transform incoming data into the required layout.
Integrate your Wrangler transformations into Designer-built processes ensuring that your domain experts/business users can effectively collaborate with your data engineering team.
New validation steps in CloverDX Wrangler will help you quickly validate your data and increase confidence in your results.
New Snowflake and Google BigQuery connectors in CloverDX Marketplace. Snowflake connector allows you to write to Snowflake from your Wrangler jobs while BigQuery is designed for high-performance writes from your graphs.
Other features, including:
Health check job for your libraries to allow you to monitor connectivity to your sources and targets
Support for CloverDX Server deployments on Java 17 for increased performance and security
Platform updates and security fixes
Usability improvements
On July 6, 2021, MariaDB 10.6 became generally available (production ready). This presentation focuses on the most important aspects of it as well as the influence it has. Improvements to InnoDB, SYS Schema Adoption, and deprecated variables and engines are all part of this presentation.
Topic 15: Datacenter Design and NetworkingZubair Nabi
The document discusses datacenter network design and transport protocols. It begins with an introduction to traditional datacenter network topologies, which use a 2-3 level tree structure. It then covers fat-tree and DCell topologies as alternatives. The document also discusses how TCP, while commonly used, is not optimal for datacenter networks due to design assumptions like round-trip time that differ from wide-area networks. It suggests transport protocols designed for datacenter characteristics could improve performance.
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
In this security solution demo, we have integrated Oracle NoSQL DB with InfiniteGraph to demonstrate the power of using the right tools for the solution. By integrating the key value technology of Oracle with the InfiniteGraph distributed graph database, we are able to create new views of existing Call Detail Record (CDR) details to enable discovery of connections, paths and behaviors that may otherwise be missed.
Discover how to add value to your existing Big Data to increase revenues and performance!
The document discusses the rise of NoSQL databases as an alternative to traditional relational databases. It provides a brief history of NoSQL, noting that new types of applications and data led developers to look for databases that offer more flexibility and scalability. It also describes the main types of NoSQL databases - key-value stores, graph stores, column stores, and document stores - and discusses some of the advantages of NoSQL databases like flexibility, scalability, availability and lower costs.
This document provides an overview of Neo4j's vision and roadmap. It discusses Neo4j's goal of being a modern, enterprise data platform that can power both operational and analytical workloads. Key aspects of Neo4j's strategy include building a fully cloud-native database designed for operational and analytical graph workloads, with autonomous clustering to provide unlimited horizontal scalability. The document also briefly reviews recent Neo4j releases and highlights some new features like graph pattern matching and change data capture.
This issue of Dr. Dobb's Journal discusses various topics related to big data. The guest editorial discusses how after distancing themselves from SQL, NoSQL products are now moving toward more transactional models as "NewSQL" gains popularity. An article applies the lambda architecture to a Hadoop project matching social media connections. Another article discusses using Storm for real-time big data analysis as an alternative to Hadoop. The issue also includes news briefs on tools and platforms, an open-source dashboard, and an article on understanding what big data can deliver.
This document provides an overview of NoSQL databases. It discusses the key features of NoSQL, including that it has no fixed schema and avoids ACID properties. Cassandra is presented as a popular example of a NoSQL database, with its ability to handle large amounts of structured data without failures. The document compares NoSQL to SQL databases, noting NoSQL's advantages in scalability and performance.
The document discusses the new features of MySQL 8.0. It covers improvements to SQL functionality with common table expressions, window functions, and JSON support. It also discusses performance enhancements including hash joins, faster I/O with the new InnoDB buffer, and group replication for high availability. New features improve security, validation, indexing and usability.
This document discusses network communication in Unix systems. It describes how the networking infrastructure abstracts different network architectures and consists of network protocols, address families, and additional facilities. It also summarizes the network subsystem layers, memory management using mbufs, data flow between sockets and the network, common network protocols, network interfaces, routing, and protocol control blocks.
This document discusses the background and advantages of virtualization. It describes how IBM originally solved the problem of running multiple operating systems on the same machine by adding a virtual memory monitor or hypervisor. The hypervisor sits between operating systems and hardware, giving each OS the illusion of full hardware control while actually multiplexing hardware access. This allows server consolidation by running multiple OSes on fewer physical servers. The document then discusses challenges of virtualizing privileged operations, system calls, and virtual memory that require interception and emulation by the hypervisor.
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
This document provides a summary of file system concepts in the xv6 operating system, including:
1) Inodes are data structures that represent files and provide metadata and pointers to file data blocks. On-disk inodes are read into memory inodes when files are accessed.
2) Directories are represented by special directory inodes containing directory entries with names and pointers to other inodes.
3) The file system layout divides the disk into sections for the boot sector, superblock, inodes, bitmap, data blocks, and log for atomic transactions.
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
The document describes the file system layers in xv6, including the buffer cache, logging, and on-disk layout. The buffer cache synchronizes access to disk blocks and caches popular blocks in memory. The logging layer ensures atomicity by wrapping file system updates in transactions written to a log on disk before writing to the file system structures. The on-disk layout divides the disk into sections for the boot sector, superblock, inodes, bitmap, data blocks, and log blocks.
AOS Lab 8: Interrupts and Device DriversZubair Nabi
This document discusses interrupts, device drivers, and the xv6 operating system. It provides recaps of previous labs on extraordinary events like interrupts, exceptions, and system calls. It explains how interrupts are handled on multi-processor systems using the I/O APIC to route interrupts and the LAPIC as a per-CPU interrupt controller. An example is given of how timer interrupts are used to track time and scheduling. Device drivers are introduced as code that manages devices by providing interrupt handlers and controlling device operations. The disk driver is given as an example to copy data between disk and memory in 512-byte sectors.
Page tables allow the OS to multiplex process address spaces onto physical memory, protect memory between processes, and map kernel memory in user address spaces. Page tables are stored as a two-level tree structure with a page directory and page table pages. Virtual addresses are translated to physical addresses by indexing the page directory and table to obtain the physical page number in the page table entry.
The document discusses process scheduling in an operating system. It describes how an OS runs more processes than it has processors by providing each process with a virtual processor and multiplexing these across physical processors. When a process performs I/O or its time quantum expires, the scheduler selects another process to run using a timer interrupt. Context switching involves saving the context of the current process and restoring the next process using the swtch function. The scheduler runs in a loop, acquiring the process table lock to select a RUNNABLE process and releasing it to allow other CPUs access between iterations.
The document discusses system calls and how they are handled in operating systems. It explains that system calls allow user processes to request services from the kernel by generating an interrupt that switches the processor into kernel mode. On x86 processors, the interrupt handler saves process state and routes the call to the appropriate kernel code based on an interrupt descriptor table with 256 entries. The document provides details on how Linux/x86 implements system calls, exceptions, and interrupts using the IDT, and switches between user and kernel mode to maintain isolation.
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
The document discusses concurrency issues that arise in operating systems and how xv6 handles them using locks. It begins by explaining how multiple CPUs can interfere with each other when sharing kernel data structures. It also notes that even on single-CPU systems, interrupt handlers can interfere with non-interrupt code. xv6 uses locks to address concurrency for both of these situations. The document then provides examples of race conditions that can occur without locks, such as when multiple processors concurrently add to a shared linked list. It shows how xv6 implements locks and how they are used to make operations like inserting into a linked list atomic. The document also discusses challenges like lock ordering, handling locks for interrupt handlers, and when to use coarse
The document describes the process of starting a process on a PC. It explains that when a PC boots, the BIOS starts executing and loads the boot loader from the boot disk sector. The boot loader then loads the kernel into memory and jumps to it. The kernel boot loader then initializes devices and creates the first process by setting up its page table and memory space. The first process's state is set to runnable and the scheduler runs it, switching to its address space. The first process makes a system call to load the /init program, which creates the console and shell that runs as the main process.
1) xv6 is a reimplementation of the Unix Version 6 operating system (V6) in ANSI C. It is used at MIT for teaching operating systems concepts.
2) The document discusses installing xv6 on a system by cloning its source code from GitHub and compiling it. Key steps include installing dependencies, QEMU, and cloning the xv6 source code.
3) An overview of xv6's structure is provided, noting it is a monolithic kernel that provides services to user processes via system calls, allowing processes to alternate between user and kernel space.
This document provides an introduction to Linux and common Linux commands. It discusses key facts about Unix, how Linux is based on Unix, popular Linux distributions like Ubuntu, and common file system layout and commands for manipulating files and directories. The document concludes with an assignment to write a Bash script to analyze and compare British and American English dictionaries.
The document summarizes the key components of the big data stack, from the presentation layer where users interact, through various processing and storage layers, down to the physical infrastructure of data centers. It provides examples like Facebook's petabyte-scale data warehouse and Google's globally distributed database Spanner. The stack aims to enable the processing and analysis of massive datasets across clusters of servers and data centers.
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
This document proposes Raabta, a low-cost video conferencing system for developing regions. Raabta leverages existing analog cable TV networks and uses inexpensive Raspberry Pi devices as endpoints. It was designed with principles of low cost, low power usage, tolerance of failure-prone environments, and a simple interface. The system avoids reliance on internet connectivity by using the cable networks for both upstream and downstream video streams encoded for robust transmission. This approach could enable affordable, widespread communication tools for communities with limited infrastructure and resources.
The Anatomy of Web Censorship in PakistanZubair Nabi
This document summarizes a study on internet censorship in Pakistan. It found that censorship mechanisms in Pakistan were upgraded in mid-2013 from ISP-level blocking to centralized blocking at the internet exchange point (IXP) level. Most websites were blocked through DNS redirection, while some used HTTP redirection. After the upgrade, blocking was done through 200 response packets injected at the IXP level. Public VPNs and web proxies were popular ways for citizens to circumvent restrictions.
This document discusses Hive, an open source data warehousing system built on top of Hadoop. Hive allows users to query data stored in Hadoop using a SQL-like language called HiveQL. Queries are compiled into MapReduce jobs for execution. The document describes Hive's data model, data types, HiveQL language, and metastore. It provides an example of using Hive to analyze Facebook status updates.
This document discusses MapReduce application scripting. It provides an overview of Pig Latin and Cascading, two frameworks for writing MapReduce applications in a declarative way. Pig Latin scripts data flows as a sequence of steps and allows for custom user-defined functions. Cascading allows creating MapReduce pipelines using JVM languages with a source-pipe-sink paradigm. The document defines key terminology and provides examples of MapReduce jobs written in Pig Latin.
Topic 14: Operating Systems and VirtualizationZubair Nabi
The document discusses operating systems and virtualization. It provides an overview of several Linux distributions including their key features and use cases. It also describes Xen, a hypervisor used to run multiple virtual machines on a single physical machine. Xen uses a dom0 domain to control hardware access and export virtual devices to domU guest virtual machines. I/O is handled through backend and frontend device drivers in the dom0 and domUs respectively.
Lab 5: Interconnecting a Datacenter using MininetZubair Nabi
This document discusses using Mininet, an emulator for real-world networks that uses real kernel, switch, and application code on a single machine. It describes how Mininet uses Linux containers to emulate hosts, switches, and links. It also explains that Mininet creates a container and network namespace for each virtual host, with virtual interfaces connecting hosts to software switches via veth links. Finally, it briefly outlines Mininet's command line and Python interfaces.
This document discusses interfacing with the Cassandra database using Python. It introduces Cassandra as a column-based key-value store and describes how to create keyspaces and column families using the Cassandra CLI. It then explains how to interface with Cassandra from Python using the pycassa package, including connecting to Cassandra, performing insert, read, delete, batch, and slice operations on column families.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Topic 12: NoSQL in Action
1. 12: NoSQL in Action
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33
2. Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33
3. Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33
4. Introduction
At the forefront of the NoSQL movement and has influenced the design
of many subsequent systems
Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
5. Introduction
At the forefront of the NoSQL movement and has influenced the design
of many subsequent systems
Design considerations are two-fold: 1) Infrastructure and 2) Business
Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
6. Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
7. Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Commodity off-the-shelf hardware
Failure is normal
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
8. Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Commodity off-the-shelf hardware
Failure is normal
Hundreds of services, all decentralized and loosely coupled
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
10. Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
11. Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
12. Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Most services only store and retrieve data by primary key, such as best
sellers lists, shopping carts, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
13. Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Most services only store and retrieve data by primary key, such as best
sellers lists, shopping carts, etc.
No need for complex querying and management afforded by RDBMS
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
14. Design
1 Implemented as a partitioned system with replication and consistency
windows
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
15. Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
16. Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
17. Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
4 Possibility for write operations even in the presence of partitioning
amongst replicas
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
18. Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
4 Possibility for write operations even in the presence of partitioning
amongst replicas
5 Always writeable so conflict resolution needs to happen during reads
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
19. Conflict Resolution
A datastore can only perform simple conflict resolution
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
20. Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
21. Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
The application is aware of the data schema and hence better suited to
choose a conflict resolution mechanism
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
22. Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
The application is aware of the data schema and hence better suited to
choose a conflict resolution mechanism
If the application does not want to implement conflict resolution, simple
mechanisms, such as “last write wins” provided by the framework
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
23. Interface
1 Simple key/value interface storing values as BLOBs
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
24. Interface
1 Simple key/value interface storing values as BLOBs
2 Operations limited to one key/value pair at a time
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
25. Interface
1 Simple key/value interface storing values as BLOBs
2 Operations limited to one key/value pair at a time
3 No support for hierarchichal namespaces (like those in filesystems)
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
27. Node Assignment
Completely decentralized so all nodes have equal responsibilities
As nodes can be heterogeneous, work is distributed proportional to the
capabilities of a node
Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
30. Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
31. Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
32. Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
The context contains system metadata such as the object version
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
33. Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
The context contains system metadata such as the object version
Keys and values are stored as an array of bytes, and only interpreted
by the application
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
34. Partitioning
MD5 hash of keys determines their storage nodes
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
35. Partitioning
MD5 hash of keys determines their storage nodes
Consistent hashing to provide incremental scalability
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
36. Partitioning
MD5 hash of keys determines their storage nodes
Consistent hashing to provide incremental scalability
Partitioning done across virtual nodes instead of physical ones to take
hardware heterogeneity into account
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
37. Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33
39. Introduction
Schemaless document database in C++
Used by a large number of organizations including SourceForge.net.
foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA
Sports, github, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
40. Introduction
Schemaless document database in C++
Used by a large number of organizations including SourceForge.net.
foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA
Sports, github, etc.
Databases are distributed over multiple servers
Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
41. Databases and Collections
Databases contain collections (“named groupings”) of documents
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
42. Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
43. Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
But a good strategy is to create a database collection for each object
type
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
44. Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
But a good strategy is to create a database collection for each object
type
A collection is created automatically whenever the first document is
inserted into the database
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
45. Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
46. Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
For instance, the collections wiki.articles, wiki.categories
and wiki.authors exist within the namespace wiki
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
47. Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
For instance, the collections wiki.articles, wiki.categories
and wiki.authors exist within the namespace wiki
The collection namespace itself is flat, hierarchical structure only for
the user
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
49. Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
50. Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
51. Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Easy to convert between BSON and JSON and between BSON and
other programming language structures
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
52. Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Easy to convert between BSON and JSON and between BSON and
other programming language structures
Possible to insert (insert), search (find), and update a document
(save)
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
55. Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
56. Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Object ID: To identify documents within a collection
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
57. Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Object ID: To identify documents within a collection
Misc: null, array, date
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
59. References
No mechanism for foreign keys
References between documents need to be resolved by client
applications
Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
61. Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
62. Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
63. Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
64. Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
2 Aggregation via count, group, and distinct
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
65. Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
2 Aggregation via count, group, and distinct
3 MapReduce code execution on multiple nodes
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
66. Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33
67. Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
68. Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
69. Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
70. Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Data can be delivered either out of memory or from disk
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
71. Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Data can be delivered either out of memory or from disk
Used internally by Google for more than 60 projects including Google
Earth, Google Analytics, Orkut, and Google Docs
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
72. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
73. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
74. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
75. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
76. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
77. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
78. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Row ranges with small lexicographic distances are partitioned into fewer
tablets
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
79. Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Row ranges with small lexicographic distances are partitioned into fewer
tablets
For instance storing URLs in reverse order: com.cnn.blogs,
com.cnn.www, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
80. Columns
No limit on the number of columns per table
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
81. Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
82. Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
83. Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Expected to store the same or similar type of data so that it can be
compressed
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
84. Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Expected to store the same or similar type of data so that it can be
compressed
Need to be created before data can be stored in a column
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
85. Timestamps
64-bit integers that represent different versions of a cell value
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
86. Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
87. Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Cells ordered in decreasing order of their timestamp
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
88. Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Cells ordered in decreasing order of their timestamp
Automatic garbage collection can be used to remove revisions
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
89. API
Read operations for lookup, selection, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
90. API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
91. API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
92. API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
93. API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
MapReduce hooks
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
94. API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
MapReduce hooks
Transactions are atomic at the single-row level
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
97. HBase
Open source clone of HBase in Java
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
98. HBase
Open source clone of HBase in Java
Implemented atop HDFS
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
99. HBase
Open source clone of HBase in Java
Implemented atop HDFS
HBase can be the source and/or the sink of Hadoop jobs
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
100. HBase
Open source clone of HBase in Java
Implemented atop HDFS
HBase can be the source and/or the sink of Hadoop jobs
Facebook Chat implemented using HBase
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
101. Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33
103. Introduction
Borrows concepts from both Dynamo and BigTable
Originally developed by Facebook but now an Apache open source
project
Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
104. Introduction
Borrows concepts from both Dynamo and BigTable
Originally developed by Facebook but now an Apache open source
project
Designed for Facebook Chat for efficiently storing, indexing, and
searching messages
Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
105. Design Goals
Processing of a large amount of data
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
106. Design Goals
Processing of a large amount of data
Highly scalable
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
107. Design Goals
Processing of a large amount of data
Highly scalable
Reliability at a massive scale
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
108. Design Goals
Processing of a large amount of data
Highly scalable
Reliability at a massive scale
High throughput writes without sacrificing read efficiency
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
109. Data Model
A table is a distributed multidimensional map indexed by a key
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
110. Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
111. Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
112. Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
113. Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Super columns are columns with sub columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
114. Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Super columns are columns with sub columns
Only three operations to get, insert, and delete
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
115. References
1 NoSQL Databases: https:
//oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf
Zubair Nabi 12: NoSQL in Action April 20, 2013 33 / 33