What is a real-time recommendation engine? Our Senior Software Engineer, David Lippa, and our CTO, Jason Vertrees, break down the background, method, and results.
Part of series on integrating space assets in airport management and operation.
Made as part of Ascend XYZ, Ammo https://artes-apps.esa.int/projects/ammo
This is the first poster I presented as part of my PhD. It focuses on executing N-body simulations using GRAPE specialized hardware on machines in different continents.
This is a talk titled "Cloud-Based Services For Large Scale Analysis of Sequence & Expression Data: Lessons from Cistrack" that I gave at CAMDA 2009 on October 6, 2009.
Part of series on integrating space assets in airport management and operation.
Made as part of Ascend XYZ, Ammo https://artes-apps.esa.int/projects/ammo
This is the first poster I presented as part of my PhD. It focuses on executing N-body simulations using GRAPE specialized hardware on machines in different continents.
This is a talk titled "Cloud-Based Services For Large Scale Analysis of Sequence & Expression Data: Lessons from Cistrack" that I gave at CAMDA 2009 on October 6, 2009.
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
Spark for Behavioral Analytics Research: Spark Summit East talk by John W uSpark Summit
This presentation reports our experience on using the machine learning techniques in Apache Spark ecosystem to understand the user behavior in a number of applications. In this context, Spark makes the vast computing power of a large high-performance computing system available to the behavioral economists without requiring the application scientists to learn about parallel computing. To illustrate the effectiveness of this approach, we focus on a compute-intensive task of establishing baseline for studying the impact of policies on consumer behavior. The gold standard for this type of baseline is a randomized control group, however, this control group can only provide a group-level reference, not for individual consumers. In many cases, the self-selection bias along with other factors can make it extremely difficult to generate a unbiased control group. By harnessing the computing power of Spark, we are able to learn the behavior pattern for each individual user and therefore create a much more precise baseline for behavioral analysis. We will use two use cases to illustrate the approach: a residential electricity usage study and a traffic pattern prediction study.
Given at PyDataSV 2014
In machine learning, clustering is a good way to explore your data and pull out patterns and relationships. Scikit-learn has some great clustering functionality, including the k-means clustering algorithm, which is among the easiest to understand. Let's take an in-depth look at k-means clustering and how to use it. This mini-tutorial/talk will cover what sort of problems k-means clustering is good at solving, how the algorithm works, how to choose k, how to tune the algorithm's parameters, and how to implement it on a set of data.
In this deck from GTC 2019, Seongchan Kim, Ph.D. presents: How Deep Learning Could Predict Weather Events.
"How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more."
Watch the video: https://wp.me/p3RLHQ-k2T
Learn more: http://en.kisti.re.kr/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
Of course, you know what data is. Probably you know what Big Data and small data is. But what's the heck is that buzz about data? Why is it so important today? These are the questions which will be the topic of the session. This session will be beyond the definitions and descriptions. We will talk about data, about different options for data usage, and how we can benefit from data.
Efficient processing of Rank-aware queries in Map/ReduceSpiros Oikonomakis
Through the experimental part and the execution of three different algorithms, aims to show the disadvantages of the default operation of the Map/Reduce programming model in Top-K queries, as well as the recommended solution and the effective processing of such query types. Two of the major shortcomings that occur will be managed, namely the Early Termination and the Load Balancing. There is a code which is implemented for this solution.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
Spark for Behavioral Analytics Research: Spark Summit East talk by John W uSpark Summit
This presentation reports our experience on using the machine learning techniques in Apache Spark ecosystem to understand the user behavior in a number of applications. In this context, Spark makes the vast computing power of a large high-performance computing system available to the behavioral economists without requiring the application scientists to learn about parallel computing. To illustrate the effectiveness of this approach, we focus on a compute-intensive task of establishing baseline for studying the impact of policies on consumer behavior. The gold standard for this type of baseline is a randomized control group, however, this control group can only provide a group-level reference, not for individual consumers. In many cases, the self-selection bias along with other factors can make it extremely difficult to generate a unbiased control group. By harnessing the computing power of Spark, we are able to learn the behavior pattern for each individual user and therefore create a much more precise baseline for behavioral analysis. We will use two use cases to illustrate the approach: a residential electricity usage study and a traffic pattern prediction study.
Given at PyDataSV 2014
In machine learning, clustering is a good way to explore your data and pull out patterns and relationships. Scikit-learn has some great clustering functionality, including the k-means clustering algorithm, which is among the easiest to understand. Let's take an in-depth look at k-means clustering and how to use it. This mini-tutorial/talk will cover what sort of problems k-means clustering is good at solving, how the algorithm works, how to choose k, how to tune the algorithm's parameters, and how to implement it on a set of data.
In this deck from GTC 2019, Seongchan Kim, Ph.D. presents: How Deep Learning Could Predict Weather Events.
"How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more."
Watch the video: https://wp.me/p3RLHQ-k2T
Learn more: http://en.kisti.re.kr/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
Of course, you know what data is. Probably you know what Big Data and small data is. But what's the heck is that buzz about data? Why is it so important today? These are the questions which will be the topic of the session. This session will be beyond the definitions and descriptions. We will talk about data, about different options for data usage, and how we can benefit from data.
Efficient processing of Rank-aware queries in Map/ReduceSpiros Oikonomakis
Through the experimental part and the execution of three different algorithms, aims to show the disadvantages of the default operation of the Map/Reduce programming model in Top-K queries, as well as the recommended solution and the effective processing of such query types. Two of the major shortcomings that occur will be managed, namely the Early Termination and the Load Balancing. There is a code which is implemented for this solution.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k
medoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it
generates unstable and empty clusters which are meaningless.
expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations.
The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centro
requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it eliminates
unnecessary distance computation by using previous iteration. The new approach for k
systematically based on initial centroids. It generates stable clusters to improve accuracy.
Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals
Abstract Indexing high dimensional data has its utility in many real world applications. Especially the information retrieval process is dramatically improved. The existing techniques could overcome the problem of “Curse of Dimensionality” of high dimensional data sets by using a technique known as Vector Approximation-File which resulted in sub-optimal performance. When compared with VA-File clustering results in more compact data set as it uses inter-dimensional correlations. However, pruning of unwanted clusters is important. The existing pruning techniques are based on bounding rectangles, bounding hyper spheres have problems in NN search. To overcome this problem Ramaswamy and Rose proposed an approach known as adaptive cluster distance bounding for high dimensional indexing which also includes an efficient spatial filtering. In this paper we implement this high-dimensional indexing approach. We built a prototype application to for proof of concept. Experimental results are encouraging and the prototype can be used in real time applications. Index Terms–Clustering, high dimensional indexing, similarity measures, and multimedia databases
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONAM Publications,India
The main aim of this paper is to develop a new dynamic indexing structure to support very large datasets and high dimensionality. This new structure is tree based used to facilitate efficient access. It is highly adaptable to any type of applications. The newly developed structure is based on nearest neighbors’ method with exception of linearly scan the very large datasets. The NewTree surely minimizes adverse effect of the curse of dimensionality. It means that the most existing indexing techniques degrade rapidly when dimensionality goes higher. The major drawback here is the retrieval of subsets from the huge storage system. The NewTree structure can handle very efficiently and effectively during adding new data. When the new data are added and the shape of the structure does not change. The performance of the newly developed structure can be evaluated with SR Tree, existing indexing structure. The results clearly show that the efficiency of the newly developed structure is superior in both time complexity and memory complexity than SR Tree.
Fast top k path-based relevance query on massive graphsieeechennai
Fast top k path-based relevance query on massive graphs
+91-9994232214,7806844441, ieeeprojectchennai@gmail.com,
www.projectsieee.com, www.ieee-projects-chennai.com
IEEE PROJECTS 2016-2017
-----------------------------------
Contact:+91-9994232214,+91-7806844441
Email: ieeeprojectchennai@gmail.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Development Infographic
1. The algorithm operates as follows:
1. Create a set S = { s1
, s2
, …, sn
} representing each office space.
2. Create a kd-tree T using the set S.
3. While S is not empty:
a) Pop point s from S and compute the radius r around it containing the nearest 50 neighboring
buildings using a pre-built SciPy KDTree and starting maximum distance d = 0.082° 9 km.≅
b) Using T, find all spaces within r and add them to a new cluster, and remove them from S.
c) Merge the new cluster into an existing cluster, if there is overlap between them.
4. If the number of clusters is greater than k, recursively perform Step 3-4 with the original set S
and 2d as the new maximum distance. Otherwise, merge intersecting clusters and compute a
weighted centroid and radius for each cluster.
Creating a Real-Time Recommendation Engine using Modified K-Means
Clustering and Remote Sensing Signature Matching Algorithms
Abstract
Built on Google App Engine, Real-
Massive encountered challenges
while attempting to scale its rec-
ommendation engine to match a
14% week-over-week increase of
data. To address this problem of
scale, we applied techniques from
spectral data processing to trans-
form our domain-specific problem.
The result: a quantitative solution to
a qualitative problem that can
match the skill of domain experts
while operating in sub-second time.
David Lippa*,†
Jason Vertrees**,†
Background
Spectral analysis algorithms provide one way to quantify similarity when comparing a data collection
against a known signature. This process—material identification3,6,9
—is quite literally finding a needle in a
pixelated haystack. One such algorithm, Spectral Angle Mapper treats each pixel as an n-dimensional
vector, computing the angle between them using the definition of a dot product: A · B = ||A|| ||B|| cos θ.
Similarity increases as |θ| approaches 0. 10° is a typical upper threshold. Negative angles are valid in
spectral datasets, but not in our case, since values are always positive.
To remap the problem, we:
● Treat the list of potential candidates as “pixels” of a
spectral data cube.
● Create a library of “signature” vectors.
● Cluster using a stripped down version of SciPy's
kdtree.py since Google App Engine prohibits
execution of native code in 3rd
party libraries.2
● Use independent object attributes for vector
components, such as cost, size, number of
parking spaces, etc.
● Avoid ratios and dependent variables.
● Aggregate each cluster's vector components to
produce a “signature.”
● Sort the results in ascending order by θ.
This solution results in a quantifiable, accurate, and
flexible measurement of similarity.
Phase 1: Clustering
K-means clustering is one of the best-known
methods for breaking up n data points into k
discrete clusters. While easy to implement
and fast in practice, a few worst-case sce-
narios may arise in certain unusual data
conditions8
. To mitigate this, we exploit
known attributes of the data: limited overlap
between data points since they exist physi-
cally in 3-dimensional space; limited data
range since the data is clustered by latitude
and longitude; related data that can used to
improve estimation of the initial cluster sizes.
Results
Since its inception, the new recommendation service has provided more than 302,925 recommendations
in sub-second time. With each call, it sifts through over 80,000 spaces and has handled a workload of
18,327 requests per work day and 6,188 per hour. The result was the product of just 3 weeks of implemen-
tation time, from design to production.
In the future, we will add refinements to the clustering algorithm to consider client-specific needs and other
related data sets. We can also improve the matching algorithm by applying a cosine rule or Euclidian dis-
tance calculation to prevent an extreme case of collinearity–such as the vectors (1, 1, 1) and (1000, 1000,
1000)–showing as a perfect match.
Summary
Google App Engine provides a powerful search engine in a scalable infrastructure. It can be customized to
address new problems outside of typical keyword searching. To address our problem of pattern matching
in commercial real estate, we created a new scalable, domain-specific recommendation engine. We bor-
rowed techniques from the field of remote sensing, while also taking advantage of constraints and satisfic-
ing over optimizing to overcome our rapid data growth and the restrictions of Google App Engine.
*
david.lippa@realmassive.com
**
jason.vertrees@realmassive.com
†
RealMassive, Inc. 1717 West 6th
St. Austin, TX 78703
+
This data cube measures 614 x 512 pixels x 224 bands spanning the entire
visible, near-infrared, and short-wave infrared spectrum. Visualizations provided
by the open-source Opticks remote sensing toolkit4
.
References:
1. AVARIS Home page. (2015, June 26). Retrieved from http://aviris.jpl.nasa.gov/data/free_data.html
2. Google. (2015, June 11). Google App Engine for Python 1.9.21 Documentation.
Retrieved from https://cloud.google.com/appengine/docs/python
3. Landgrebe, David A (2005). Signal Theory Methods in Multispectral Remote Sensing. Hoboken, NJ: John Wiley & Sons.
4. Opticks. (2015, June 26). Opticks remote sensing toolkit. Retrieved from https://opticks.org
5. RealMassive. (2015, June 10). Retrieved from https://www.realmassive.com
Method
There are 3 phases needed to overcome constraints imposed by App Engine2
:
● Cluster user inputs into “signatures” to reduce the length of query strings and sort expressions.
● Apply fixed filters to limit search results to within the 10,000 hit sort limit.
● Score results by signature match to override the default search-term relevance score.
Doubling the initial radius results in an absolute maximum of 26 recursive calls for an overall asymp-
totic complexity of O(2n log2
n). This never happens in practice due to low building density. The final
result is similar to the representation of clusters in Figure 35
. Once the spaces have been clustered, it
is trivial to compute the average of each vector component to produce each cluster’s signature.
Figure 3: Clustering 50 spaces from across the US
Figure 2: Graphic representation of hyper-spectral data7
Figure 1: A Commercial Real Estate Survey with Recommendations
Phase 2: Filtering
Next, we apply fixed filters informed by domain expertise. For commercial real estate, this includes the
building type (e.g. "office", "industrial", etc.), location, and any necessary exclusions. These constraints
produce a reasonable subset that can be matched against signatures.
Figure 4: AVARIS data courtesy NASA/JPL-Caltech,
showing a signature match1+
6. M. Richmond. Licensed under Creative Commons. Retrieved from http://spiff.rit.edu/classes/phys301/lectures/comp/comp.html
7. N Short, Sr. Graphic representation of hyperspectral data. Licensed under Creative Commons.
Retrieved from http://rst.gsfc.nasa.gov/
8. A. Vattani. K-means Requires Exponentially Many Iterations Even in the Plane, Discrete Comput Geom. 45(4): 596–616. 2011.
9. H. Zhang, Y. Lan, R. Lacey, W. Hoffmann, Y. Huang. Analysis of vegetation indices derived from aerial multispectral and ground
hyperspectral data, International Journal of Agricultural and Biological Engineering. 2(3): 33. 2009.
Acknowledgments
The authors would like to thank Fatih Akici, John Leonard, Natalya Shelburne, and Michael Westgate for their suggestions for this poster.
Phase 3: Sort by Angle
Executing the Spectral Angle Mapper algo-
rithm on a reduced dataset of 10,000 items
equates to performing material identification
on a 115 x 87 pixel x 3-band data cube from
a multi-spectral sensor, or 3% of the compu-
tations required for a small data cube, such
as Figure 4. Google App Engine can quickly
perform calculations in-place on search re-
sults, but it lacks the inverse cosine function2
.
Our solution uses the cosine ratio as a proxy
for the angle: sorting by the cosine ratio in
descending order is equivalent to sorting by
the angle in ascending order to find the most
similar matches.