presentation I gave on GaianDB - a dynamic federated distributed database available on IBM alphaWorks
The presentation wont make a lot of sense without speaker notes... which I've not written yet. Sorry about that.
Getting Started with Databricks SQL AnalyticsDatabricks
It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst’s user journey look like with the tool?
This session will introduce the new workspace and walk through the various key features – how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI.
If you’re truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you’ll need to be familiar with and include in your design patterns, and this session will set you on the right path.
Understanding and Improving Code GenerationDatabricks
Code generation is integral to Spark’s physical execution engine. When implemented, the Spark engine creates optimized bytecode at runtime improving performance when compared to interpreted execution. Spark has taken the next step with whole-stage codegen which collapses an entire query into a single function.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
Databricks' founders caused a seismic shift in data analysis community when they created Apache Spark which has become a cornerstone of Big Data processing pipelines and tools in large and small companies all around the world. Now they've built a revolutionary, comprehensive and easy-to-use platform around Apache Spark and their other inventions, such as MLFlow and Koalas frameworks and most importantly the Data Lakehouse: a concept of fusing data warehouse and data lake architectures into a single versatile and fast platform. Technical foundation for Databricks Data Lakehouse is Delta Lake. More than 7000 organizations today rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. Come to the talk and see the demo to find out why.
Getting Started with Databricks SQL AnalyticsDatabricks
It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst’s user journey look like with the tool?
This session will introduce the new workspace and walk through the various key features – how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI.
If you’re truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you’ll need to be familiar with and include in your design patterns, and this session will set you on the right path.
Understanding and Improving Code GenerationDatabricks
Code generation is integral to Spark’s physical execution engine. When implemented, the Spark engine creates optimized bytecode at runtime improving performance when compared to interpreted execution. Spark has taken the next step with whole-stage codegen which collapses an entire query into a single function.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
Databricks' founders caused a seismic shift in data analysis community when they created Apache Spark which has become a cornerstone of Big Data processing pipelines and tools in large and small companies all around the world. Now they've built a revolutionary, comprehensive and easy-to-use platform around Apache Spark and their other inventions, such as MLFlow and Koalas frameworks and most importantly the Data Lakehouse: a concept of fusing data warehouse and data lake architectures into a single versatile and fast platform. Technical foundation for Databricks Data Lakehouse is Delta Lake. More than 7000 organizations today rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. Come to the talk and see the demo to find out why.
If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
While early big data systems, such as MapReduce, focused on batch processing, the demands on these systems have quickly grown. Users quickly needed to run (1) more interactive ad-hoc queries, (2) sophisticated multi-pass algorithms (e.g. machine learning), and (3) real-time stream processing. The result has been an explosion of specialized systems to tackle these new workloads. Unfortunately, this means more systems to learn, manage, and stitch together into pipelines. Spark is unique in taking a step back and trying to provide a *unified* post-MapReduce programming model that tackles all these workloads. By generalizing MapReduce to support fast data sharing and low-latency jobs, we achieve best-in-class performance in a variety of workloads, while providing a simple programming model that lets users easily and efficiently combine them.
Today, Spark is the most active open source project in big data, with high activity in both the core engine and a growing array of standard libraries built on top (e.g. machine learning, stream processing, SQL). I'm going to talk about the latest developments in Spark and show examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code.
Talk by Databricks CTO and Apache Spark creator Matei Zaharia at QCON San Francisco 2014.
Delta from a Data Engineer's PerspectiveDatabricks
Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.
Event Sourcing, Domain Driven Design, and Command Query Responsibility Segregation – we hear all of these technologies used together frequently, but how do they actually work together? How do you manage complex co-ordination in a CQRS system?
In this talk, we will discuss a real world example of DDD with ES and CQRS written in F# - a functional first language on the .NET Framework. We’ll take a deep dive into the F# algebraic type system that constructs the domain model. Also, the explanation of the abstract notion of a DDD aggregate root compared to the CQRS implementation of an aggregate root. Finishing with how sagas and triggers facilitate poly-aggregate communication.
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
Tracking which incoming files have been processed has always required thought and design when implementing an ETL framework. The Autoloader feature of Databricks looks to simplify this, taking away the pain of file watching and queue management. However, there can also be a lot of nuance and complexity in setting up Autoloader and managing the process of ingesting data using it. After implementing an automated data loading process in a major US CPMG, Simon has some lessons to share from the experience.
This session will run through the initial setup and configuration of Autoloader in a Microsoft Azure environment, looking at the components used and what is created behind the scenes. We’ll then look at some of the limitations of the feature, before walking through the process of overcoming these limitations. We will build out a practical example that tackles evolving schemas, applying transformations to your stream, extracting telemetry from the process and finally, how to merge the incoming data into a Delta table.
After this session you will be better equipped to use Autoloader in a data ingestion platform, simplifying your production workloads and accelerating the time to realise value in your data!
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved.
What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data?
What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output?
When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency?
How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions.
These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDatabricks
Adding nodes at runtime (Upscale) to already running Spark-on-Yarn clusters is fairly easy. But taking away these nodes (Downscale) when the workload is low at some later point of time is a difficult problem. To remove a node from a running cluster, we need to make sure that it is not used for compute as well as storage.
But on production workloads, we see that many of the nodes can’t be taken away because:
Nodes are running some containers although they are not fully utilized i.e., containers are fragmented on different nodes. Example. – each node is running 1-2 containers/executors although they have resources to run 4 containers.
Nodes have some shuffle data in the local disk which will be consumed by Spark application running on this cluster later. In this case, the Resource Manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages.
In this talk, we will talk about how we can improve downscaling in Spark-on-YARN clusters under the presence of such constraints. We will cover changes in scheduling strategy for container allocation in YARN and Spark task scheduler which together helps us achieve better packing of containers. This makes sure that containers are defragmented on fewer set of nodes and thus some nodes don’t have any compute. In addition to this, we will also cover enhancements to Spark driver and External Shuffle Service (ESS) which helps us to proactively delete shuffle data which we already know has been consumed. This makes sure that nodes are not holding any unnecessary shuffle data – thus freeing them from storage and hence available for reclamation for faster downscaling.
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
GumGum recently moved to Amazon DynamoDB from Apache Cassandra. In this session, we discuss the architecture and design decisions made in the process, including comparisons of different NoSQL database options. We also share the justifications and steps taken in order to plan and complete the migration process. Finally, we cover the benefits and outcome of the migration, including performance boost, cost savings, and maintenance reductions.
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent
We wanted to embed a Kafka producer/consumer in C++ and decided to use ""librdkafka"", a robust C/C++ library that is open source, well-maintained, and widely used.
The C++ interface of ""librdkafka"" is confined to C++ 98 for compatibility, which makes it less object-oriented and user-friendly. Raw pointers in the interface complicates object lifecycle management and limited encapsulation leading to a more complex API.
That is why we built a C++ Kafka API using modern C++ as a wrapper on top of ""librdkafka"". This header-only library is quite similar to the Java API, object-oriented and user-friendly. It frees us from the burdens mentioned above so even a novice can pick it up and work out an efficient solution.
The ""modern-cpp-kafka"" project on github has been thoroughly tested within our company. After using it to replace a legacy implementation, throughput for a key middleware system gained 26%!
If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
While early big data systems, such as MapReduce, focused on batch processing, the demands on these systems have quickly grown. Users quickly needed to run (1) more interactive ad-hoc queries, (2) sophisticated multi-pass algorithms (e.g. machine learning), and (3) real-time stream processing. The result has been an explosion of specialized systems to tackle these new workloads. Unfortunately, this means more systems to learn, manage, and stitch together into pipelines. Spark is unique in taking a step back and trying to provide a *unified* post-MapReduce programming model that tackles all these workloads. By generalizing MapReduce to support fast data sharing and low-latency jobs, we achieve best-in-class performance in a variety of workloads, while providing a simple programming model that lets users easily and efficiently combine them.
Today, Spark is the most active open source project in big data, with high activity in both the core engine and a growing array of standard libraries built on top (e.g. machine learning, stream processing, SQL). I'm going to talk about the latest developments in Spark and show examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code.
Talk by Databricks CTO and Apache Spark creator Matei Zaharia at QCON San Francisco 2014.
Delta from a Data Engineer's PerspectiveDatabricks
Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.
Event Sourcing, Domain Driven Design, and Command Query Responsibility Segregation – we hear all of these technologies used together frequently, but how do they actually work together? How do you manage complex co-ordination in a CQRS system?
In this talk, we will discuss a real world example of DDD with ES and CQRS written in F# - a functional first language on the .NET Framework. We’ll take a deep dive into the F# algebraic type system that constructs the domain model. Also, the explanation of the abstract notion of a DDD aggregate root compared to the CQRS implementation of an aggregate root. Finishing with how sagas and triggers facilitate poly-aggregate communication.
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
Tracking which incoming files have been processed has always required thought and design when implementing an ETL framework. The Autoloader feature of Databricks looks to simplify this, taking away the pain of file watching and queue management. However, there can also be a lot of nuance and complexity in setting up Autoloader and managing the process of ingesting data using it. After implementing an automated data loading process in a major US CPMG, Simon has some lessons to share from the experience.
This session will run through the initial setup and configuration of Autoloader in a Microsoft Azure environment, looking at the components used and what is created behind the scenes. We’ll then look at some of the limitations of the feature, before walking through the process of overcoming these limitations. We will build out a practical example that tackles evolving schemas, applying transformations to your stream, extracting telemetry from the process and finally, how to merge the incoming data into a Delta table.
After this session you will be better equipped to use Autoloader in a data ingestion platform, simplifying your production workloads and accelerating the time to realise value in your data!
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved.
What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data?
What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output?
When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency?
How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions.
These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDatabricks
Adding nodes at runtime (Upscale) to already running Spark-on-Yarn clusters is fairly easy. But taking away these nodes (Downscale) when the workload is low at some later point of time is a difficult problem. To remove a node from a running cluster, we need to make sure that it is not used for compute as well as storage.
But on production workloads, we see that many of the nodes can’t be taken away because:
Nodes are running some containers although they are not fully utilized i.e., containers are fragmented on different nodes. Example. – each node is running 1-2 containers/executors although they have resources to run 4 containers.
Nodes have some shuffle data in the local disk which will be consumed by Spark application running on this cluster later. In this case, the Resource Manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages.
In this talk, we will talk about how we can improve downscaling in Spark-on-YARN clusters under the presence of such constraints. We will cover changes in scheduling strategy for container allocation in YARN and Spark task scheduler which together helps us achieve better packing of containers. This makes sure that containers are defragmented on fewer set of nodes and thus some nodes don’t have any compute. In addition to this, we will also cover enhancements to Spark driver and External Shuffle Service (ESS) which helps us to proactively delete shuffle data which we already know has been consumed. This makes sure that nodes are not holding any unnecessary shuffle data – thus freeing them from storage and hence available for reclamation for faster downscaling.
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
GumGum recently moved to Amazon DynamoDB from Apache Cassandra. In this session, we discuss the architecture and design decisions made in the process, including comparisons of different NoSQL database options. We also share the justifications and steps taken in order to plan and complete the migration process. Finally, we cover the benefits and outcome of the migration, including performance boost, cost savings, and maintenance reductions.
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent
We wanted to embed a Kafka producer/consumer in C++ and decided to use ""librdkafka"", a robust C/C++ library that is open source, well-maintained, and widely used.
The C++ interface of ""librdkafka"" is confined to C++ 98 for compatibility, which makes it less object-oriented and user-friendly. Raw pointers in the interface complicates object lifecycle management and limited encapsulation leading to a more complex API.
That is why we built a C++ Kafka API using modern C++ as a wrapper on top of ""librdkafka"". This header-only library is quite similar to the Java API, object-oriented and user-friendly. It frees us from the burdens mentioned above so even a novice can pick it up and work out an efficient solution.
The ""modern-cpp-kafka"" project on github has been thoroughly tested within our company. After using it to replace a legacy implementation, throughput for a key middleware system gained 26%!
Object multifunctional indexing with an open API akvalex
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by a set of any functions F (S).
3. Very fast search. It allows to save time and money.
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by set any functions F (S).
3. Very fast search. It allows to save time and money.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
Towards neuralprocessingofgeneralpurposeapproximateprogramsParidha Saxena
Did validation of one of the machine learning algorithms of neural networks,and compared the results for its implementation on hardware (FPGA) using xilinx, with that of a sequential code execution(using FANN).
Neurogrid : A Mixed-Analog-Digital Multichip System for Large-Scale Neural Si...IIT Bombay
We describe the design of Neurogrid, a neuromorphic system for simulating large-scale neural models in real time. Neuromorphic systems realize the function of biological neural systems by emulating their structure. Designers of such systems face three major design choices: 1) whether to emulate the four neural elements-axonal arbor, synapse, dendritic tree, and soma-with dedicated or shared electronic circuits; 2) whether to implement these electronic circuits in an analog or digital manner; and 3) whether to interconnect arrays of these silicon neurons with a mesh or a tree network. The choices we made were: 1) we emulated all neural elements except the soma with shared electronic circuits; this choice maximized the number of synaptic connections; 2) we realized all electronic circuits except those for axonal arbors in an analog manner; this choice maximized energy efficiency; and 3) we interconnected neural arrays in a tree network; this choice maximized throughput. These three choices made it possible to simulate a million neurons with billions of synaptic connections in real time-for the first time-using 16 Neurocores integrated on a board that consumes three watts.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Enchancing the Data Collection in Tree based Wireless Sensor Networksijsrd.com
Number of techniques used in Wireless Sensor Network to improve data collection from sensor nodes. It achieve by minimize the schedule length and dynamic channel assignment. Schedule length minimized by BFS algorithm without interfering links. Interfering links can be eliminated by transmission power control and multi frequency. The power can be save by using beacon signal. Collection of data can also be limited by topology of network. So the nodes are arranged in form. The capacitated minimal spanning trees and degree- constrained spanning trees give significant improvement in scheduling. Finally the data collection is enhancing in terms of security by using T-Hash Chain algorithm.
Security-Aware Scheduling for Real-Time Parallel Applications on ClustersXiao Qin
Outline:
1. Motivation
Problem Statement
Motivations
2. A Security-Aware Middleware Model
Architecture of the Security Middleware Model
Quality of Security Control Manager
Security Service Requirements Specification
3. Security Overhead Models
Confidentiality Overhead
Integrity Overhead
Authentication Overhead
4. A Task Allocation Scheme
Mathematical Models
System Models
Task Models
The TAPADS Task Allocation Scheme
Performance Evaluation
5. Improving Security for Local Disk Systems
Motivation
Architecture and Disk Requests with Security Requirements
An Adaptive Write Strategy
Performance Evaluation
Synthetic Benchmarks
Real I/O-Intensive Applications
6. Quality of Security Adaptation for Cluster Storage Systems
System Architecture
The Framework
Data Partitioning
Estimating Response Times
The Quality of Security Control Algorithm
Performance Evaluation
7. Conclusions
In remote sensor arrange messages are exchanged between the different source and goal matches agreeably such way that multi-jump parcel transmission is utilized. These information bundles are exchanged from the middle of the road hub to sink hub by sending a parcel to goal hubs. Where each hub overhears transmission close neighbor hub. To dodge this we propose novel approach with proficient steering convention i.e. most brief way directing and conveyed hub steering calculation. Proposed work additionally concentrates on Automatic Repeat Request and Deterministic Network coding. We spread this work by the end to end message encoding instrument. To upgrade hub security match shrewd key era is utilized, in which combined conveying hub is allocated with combine key to making secure correspondence. End to end. We dissect both single and numerous hubs and look at basic ARQ and deterministic system coding as strategies for transmission.
International Refereed Journal of Engineering and Science (IRJES)irjes
a leading international journal for publication of new ideas, the state of the art research results and fundamental advances in all aspects of Engineering and Science. IRJES is a open access, peer reviewed international journal with a primary objective to provide the academic community and industry for the submission of half of original research and applications.
This session will quickly show you how to describe the security configuration of your Kafka cluster in an AsyncAPI document. And if you've been given an AsyncAPI document, this session will show you how to use that to configure a Kafka client or application to connect to the cluster, using the details in the AsyncAPI spec.
An intro to serverless and OpenWhisk for Kafka usersDale Lane
My talk from Kafka Summit London 2019. It's aimed at Kafka users who want to know what the serverless buzz is all about, with an explanation of what it's for, and a quick crash course in how to get started with Apache OpenWhisk.
Video at https://dalelane.co.uk/blog/?p=3769
How to increase the social impact you makeDale Lane
Speakers notes at http://dalelane.co.uk/blog/?p=3685
Machine Learning for Kids puts IBM’s AI technologies in the hands of children so they can learn about AI by playing, experimenting and making with them. It was created as a volunteer activity, initially for Dale’s own use, but this was incrementally scaled - with a variety of approaches used to enable and encourage use by both IBM locations and schools around the world.
This presentation describes some of the lessons learned doing this. It describes how you could increase the impact of the volunteer work that you do and ways to turn something that you do in a single local school into something that can be used around the world.
Faith's talk at Barcamp Southampton 2016
She made a chat bot that can answer questions about owls, and used that for most of her presentation. Here are some slides we used to talk about why she chose owls, and how she made the bot.
Pushing, pulling or leaving the door openDale Lane
A talk about mobile apps that rely on data from the Internet, and some of the decisions and choices facing mobile app developers in writing them
SlideShare kinda screws with the speaker's notes, so if you'd like the notes it's probably best to download the presentation file.
Overview of the talk is written up at http://dalelane.co.uk/blog/?p=1009
a hack created at Open Hack London 2009
hacked together using:
- Yahoo! Blueprint
- Yahoo! Fire Eagle
- Google App Engine
in under a day.
see it live at http://feguestpass.appspot.com/
slightly more info at http://dalelane.co.uk/blog/?p=667
Another CurrentCost presentation. Updated to put it in a bit of context, and to reflect some more recent hacking efforts
Presented at BarCampLondon6 and BathCamp
In this presentation, I give an introduction to Windows PowerShell:
- What is it, and how does it work?
- How can you extend it to provide support for administering your own product or project?
NOTES:
1) Some of the text in this presentation is a little small for reading in a 400 pixel flash viewer. I'd recommend downloading the presentation instead.
2) The slides might not make sense without the notes that go with them. I've added the notes as comments to each slide. They still might not make much sense, but that's a different problem :-)
Cutting the grass - a work in progress. Or a very poor attempt at stop-frame animation :-)
A series of photos - taken every 10 mins or so from the same position, as I cut the grass on a Sunday afternoon
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
8. A dynamic network
Biologically-Inspired Self-Organisation
Exploit natural selection in nature to
build better networks
Robust self-organizing network
architectures
Frameworks and algorithms for robust
fault-tolerant information dissemination
Robust communications with minimal
complexity or human control
9. Gaian database
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Queries
Queries routed to all database nodes – a
flood query, but retrieving only the data
required to satisfy a query
Exchanges query traffic in the network for
data traffic – aiming to minimize total traffic
Predicated on a concept of ‘store
data locally - read data from
anywhere’ paradigm
10. Architecture
GaianDB
Derby Engine: Parsing, Compilation, Execution
GaianPStmtNode VTI:
Executes queries on physical leaf nodes +
Propagates the original SQL (+ queryID & steps state info) to linked Gaian nodes
Instantiates Invokes costing
methods
Pushes columns
and ‘where’ clause
in a structure
MQ(tt)
Stream Data
Original SQL
DB2 Oracle MS
SQLServer
Sybase MySQL Flat files
In-memory
tables
Derby
GaianDB
GaianDB
GaianDB
propagate
Text Index
Derby
tables
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
Expanded Node
Multithreaded, breadth-first query propagation
Loop detection/handling – no duplicates
11. Performance – with 1,250 nodes
Query time for 1025 nodes, fetching up to 1025 rows from each
y = 4.217x + 349.251
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200
Row s fetched per node
Time(milliseconds)
Query Execute Time
Total Query Time
Linear (Total Query Time)
Query Performance
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
0 200 400 600 800 1000 1200
Number of Nodes
QueryTime(milliseconds)
Average Query Time
Predicted Max (Layers)
Predicted Min (Layers)
12. Performance questions
The time to propagate a query to all of
the nodes in the database, as a function
of the number of database nodes (N);
The time to fetch data from across the
nodes of the database to a single node,
as a function of the volume of data;
The time to fetch data from across the
database to multiple nodes concurrently
querying, as a function of the number
of nodes concurrently querying.
13. Graph metrics
The eccentricity ε(νi) of a graph
vertex νi is the maximum graph
distance between νi and any other
vertex νj of G i.e. the "longest
shortest path" between any two
graph vertices (νi , νj) of the graph.
The maximum eccentricity is the
graph diameter Gd. The minimum
graph eccentricity is the graph
radius Gr. We define the size of G as
the number of vertices N and the
number of connections at each
vertex as the vertex degree δi
(1 < i ≤ N).
14. Biologically inspired self-organisation
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000
Number of Nodes (N)
GraphDimension(edges)
Radius
Diameter
(1+e)ln(N)
(1-e)ln(N)
Network growth by
preferential attachment
Using a fitness function at
each node
Limit maximum vertex degree =10
Gd = nint [ (1+e) * ln(N) ]
Gr = nint [ (1-e) * ln(N) ]
e = 0.24
15. Query propagation time
The predicted maximum (Tmax) and
minimum times (Tmin) to execute the
flood query are:
TL = link latency
Tp = processor delay
Tmax = (Gd + 1)(TL + Tp)
Tmin = (Gr + 1)(TL + Tp)
with the predicted execute query time
from any node (Tν) being:
Tν = (ε(ν) + 1)(TL + Tp)
Hence substituting for ε(ν)
Tν = nint[1 + B * ln(N) * (TL + Tp)]
17. Measured data fetch
Query time to fetch 1 million rows
y = 4.217x + 349.251
y = 1.7383x + 678.141
0
1000
2000
3000
4000
5000
6000
0 200000 400000 600000 800000 1000000 1200000
Total Rows fetched
Time(milliseconds)
Total Query Time 1025 nodes
Total Query Time 1 node
Total Query Time 1 node indexed
Linear (Total Query Time 1025 nodes)
Linear (Total Query Time 1 node)
25. Image credits
Background: YouTube video “The Internet of Things”, IBM
http://www.youtube.com/watch?v=sfEbMV295Kk
Icons: DB and envelope icons, Tim Morgan
http://flickr.com/photos/timothymorgan/sets/1615269
Microsoft Excel icon, Vincent Garnier (courtesy of IconArchive)
http://iconarchive.com/show/softdimension-icons-by-benjigarner/Excel-icon.html
Photo of car mechanics, Tomas
http://flickr.com/photos/tma/2264878
All other images original from GaianDB work