Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. When testing data is present at the petabyte scale, the ability to seamlessly access all the images that have been assigned specific labels poses a technical challenge by itself.
We share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Images and labels are stored in HBase. The model is encapsulated in a (Py)Spark program, while the images are indexed with Solr and can be accessed from a Hue dashboard. Triplification of facts, detected inside images contributes to a large knowledge graph, queryable via SPARQL.
Improving computer vision models at scale presentationDr. Mirko Kämpf
Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. When testing data is present at the petabyte scale, the ability to seamlessly access all the images that have been assigned specific labels poses a technical challenge by itself.
Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Images and labels are stored in HBase. The model is encapsulated in a PySpark program, while the images are indexed with Solr and can be accessed from a Hue dashboard.
Improving computer vision models at scale presentationJan Kunigk
We developed a solution that automatically adds tags to images via neural networks running on spark on Tensorflow at scale. Images are stored in HBase and tags become searchable via Solr.
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
Learn how Cloudera provides a unified platform that breaks down data silos commonly seen in organizations. By unifying the data needed for applied machine learning, organizations are better equipped to gather valuable insights from their data.
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 13:10 - 13:40
Speaker: Brian Goral
Organisation: Cloudera
About: The field of machine learning (ML) ranges from the very practical and pragmatic to the highly theoretical and abstract. This talk describes several of the challenges facing organisations that want to leverage more of their data through ML, including some examples of the applied algorithms that are already delivering value in business contexts.
Leveraging Artificial Intelligence Processing on Edge DevicesICS
The introduction of low-cost, high-performance embedded processors coupled with improvements in Neural Network model optimization lay the foundation for AI and Computer Vision at the edge. Moving intelligence from the cloud to the edge offers many advantages including the reduction of network traffic, predicable ML inference times, and data security to name a few. Challenges exist as many development teams do not have data scientist or AI development engineers. What is needed are practical AI solutions including ML development tools, optimized inference engines and reference platforms that will abstract out the development complexities to stream line prototyping and development.
In this joint webinar with Au-Zone Technologies we will discuss:
- Development challenges and solutions which can be use to enable AI/ML at the edge to implement object detection, classification and tracking for medical and industrial use-cases
- Visualization techniques for activity monitoring and object detection
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
The AWS suite of managed services for IoT enables companies to quickly and easily deploy devices to the edge and synchronize their industrial time-series data from multiple sites to the AWS Cloud, where advanced analytics and machine learning can generate valuable insights about their business. In this session, learn how EDF Renewables used AWS Greengrass, AWS IoT Core, AWS IoT Analytics, and AWS Lambda to facilitate the collection, aggregation, and quality assurance of operational data from solar installations. Hear how working with AWS Professional Services transformed its approach to product development, and learn what challenges and solutions came with choosing leading-edge services form AWS.
1) The document discusses leveraging Modelica and FMI standards in Scilab open-source engineering software.
2) Key topics covered include Scilab use cases, integrating Modelica models into Scilab/Xcos, and using FMI for co-simulation and model exchange.
3) Demonstrations show automotive suspension modeling with Scilab/Xcos/Modelica, parameter identification in Xcos, and using FMI in Xcos for co-simulation.
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
In this session, we will introduce Dagster, a cutting-edge framework that simplifies DataOps and MLOps for machine learning engineers. We will explore the benefits of this powerful tool, learn how to implement it in your machine learning workflows, and discuss practical use cases to help you enhance productivity, collaboration, and deployment of ML models.
Improving computer vision models at scale presentationDr. Mirko Kämpf
Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. When testing data is present at the petabyte scale, the ability to seamlessly access all the images that have been assigned specific labels poses a technical challenge by itself.
Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Images and labels are stored in HBase. The model is encapsulated in a PySpark program, while the images are indexed with Solr and can be accessed from a Hue dashboard.
Improving computer vision models at scale presentationJan Kunigk
We developed a solution that automatically adds tags to images via neural networks running on spark on Tensorflow at scale. Images are stored in HBase and tags become searchable via Solr.
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
Learn how Cloudera provides a unified platform that breaks down data silos commonly seen in organizations. By unifying the data needed for applied machine learning, organizations are better equipped to gather valuable insights from their data.
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 13:10 - 13:40
Speaker: Brian Goral
Organisation: Cloudera
About: The field of machine learning (ML) ranges from the very practical and pragmatic to the highly theoretical and abstract. This talk describes several of the challenges facing organisations that want to leverage more of their data through ML, including some examples of the applied algorithms that are already delivering value in business contexts.
Leveraging Artificial Intelligence Processing on Edge DevicesICS
The introduction of low-cost, high-performance embedded processors coupled with improvements in Neural Network model optimization lay the foundation for AI and Computer Vision at the edge. Moving intelligence from the cloud to the edge offers many advantages including the reduction of network traffic, predicable ML inference times, and data security to name a few. Challenges exist as many development teams do not have data scientist or AI development engineers. What is needed are practical AI solutions including ML development tools, optimized inference engines and reference platforms that will abstract out the development complexities to stream line prototyping and development.
In this joint webinar with Au-Zone Technologies we will discuss:
- Development challenges and solutions which can be use to enable AI/ML at the edge to implement object detection, classification and tracking for medical and industrial use-cases
- Visualization techniques for activity monitoring and object detection
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
The AWS suite of managed services for IoT enables companies to quickly and easily deploy devices to the edge and synchronize their industrial time-series data from multiple sites to the AWS Cloud, where advanced analytics and machine learning can generate valuable insights about their business. In this session, learn how EDF Renewables used AWS Greengrass, AWS IoT Core, AWS IoT Analytics, and AWS Lambda to facilitate the collection, aggregation, and quality assurance of operational data from solar installations. Hear how working with AWS Professional Services transformed its approach to product development, and learn what challenges and solutions came with choosing leading-edge services form AWS.
1) The document discusses leveraging Modelica and FMI standards in Scilab open-source engineering software.
2) Key topics covered include Scilab use cases, integrating Modelica models into Scilab/Xcos, and using FMI for co-simulation and model exchange.
3) Demonstrations show automotive suspension modeling with Scilab/Xcos/Modelica, parameter identification in Xcos, and using FMI in Xcos for co-simulation.
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
In this session, we will introduce Dagster, a cutting-edge framework that simplifies DataOps and MLOps for machine learning engineers. We will explore the benefits of this powerful tool, learn how to implement it in your machine learning workflows, and discuss practical use cases to help you enhance productivity, collaboration, and deployment of ML models.
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
Learn more about how data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models, and how organisations can accelerate machine learning from research to production, while preserving the flexibility and agility data scientists and modern business use cases demand.
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.
The document discusses artificial intelligence (AI) and data analytics opportunities in the oil and gas industry. It provides examples of how AI can be used for virtual assistants, video analytics, fleet management, production optimization, preventative maintenance, and precision drilling. The benefits of these applications include increased profitability, risk mitigation, improved agility, and cost reductions. It also discusses Dell Technologies AI and HPC solutions that can enable oil and gas companies to leverage AI for applications like reservoir modeling and autonomous vehicles.
The document discusses the evolving concept of cloud computing, comparing it to historical notions of computer utilities. It describes cloud computing as having 3 service models (SaaS, PaaS, IaaS) and 4 deployment models (private, public, hybrid, community clouds) with 5 essential characteristics including on-demand access and elastic scaling. However, the exact definition of cloud computing remains unclear as the technology continues to change.
The document discusses several concepts related to cloud computing including:
- The evolving definition of cloud computing with 3 service models and 4 deployment models and 5 essential characteristics
- Diffusion of innovation theory and how new technologies are adopted over time through innovators, early adopters, early majority, late majority, and laggards
- The concept of ubiquity and how innovations progress from niche to mainstream as they become more established, standardized, and commoditized over time
The document discusses Oracle's autonomous database technology. It summarizes that autonomous databases can self-drive, self-repair, and self-secure with reduced human labor. Machine learning is used to continuously optimize databases and adapt to changing workloads. This allows DBAs to focus on higher value tasks like innovation rather than maintenance operations. Oracle's autonomous database is presented as the world's first fully autonomous database.
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
This document discusses an approach to enterprise metadata integration using a multilayer metadata model. Key points include:
- Status dashboards provide facts from technical, operational, application, and quality metadata layers
- A graph database allows for context exploration across the entire cluster
- The integration of metadata from multiple sources provides a more holistic view of business knowledge
This document summarizes a keynote talk given by Gurvinder Singh Ahluwalia (Guri), CTO of Cloud Computing at IBM, about IBM's cloud portfolio. The talk discusses how technology disruptions like cloud computing, mobile devices, and big data are impacting businesses. It outlines IBM's response to help clients think about, build, and tap into cloud solutions through its end-to-end cloud portfolio including public, private and hybrid cloud offerings. Specific examples discussed include workload analysis and migration to cloud, high performance computing, and IBM's acquisition of SoftLayer to gain infrastructure as a service capabilities.
The document discusses IBM's AI tools and capabilities. It summarizes IBM's suite of AI products including Watson Studio, Watson Machine Learning, Watson OpenScale, and the Watson Knowledge Catalog which help with data preparation, building and training models, deploying and managing models, and ensuring trusted AI. It also discusses IBM's strategy around automating the AI lifecycle through capabilities like transfer learning, neural network search, and AutoAI experiments.
This document provides an overview of several cloud simulation tools: CloudSim, CloudAnalyst, GreenCloud, and iCanCloud. CloudSim enables modeling and simulation of cloud computing infrastructures and applications. CloudAnalyst focuses on simulating large-scale cloud applications and studying their behavior under different deployment configurations using a graphical user interface. GreenCloud extends the NS2 network simulator to enable energy-aware cloud computing simulations at the packet level. iCanCloud allows modeling both existing and non-existing cloud architectures through a flexible hypervisor module and graphical interface to simulate distributed systems.
Building Microservices in the cloud at AutoScout24Christian Deger
The document summarizes the transformation of AutoScout24 from a monolithic architecture hosted on-premises to a microservices architecture hosted on AWS. It discusses the goals of breaking into autonomous teams organized around business capabilities. It outlines the architectural principles adopted such as shared nothing, event sourcing, and infrastructure as code. It also covers how the teams are organized to support a DevOps culture and continuous delivery.
How to build containerized architectures for deep learning - Data Festival 20...Antje Barth
When it comes to AI data scientists/engineers tend to focus on tools. Though the data platform that enables these tools is equally important, it’s often overlooked. In fact, 90% of the effort required for success in ML is not the algorithm – it’s the data logistics. In this workshop we will talk about common architecture blueprints to integrate AI in your data centers and how the right data platform choice can make all the difference in launching your AI use case into production! Presented at Data Festival Munich, 2019.
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge Julien SIMON
Machine learning at the edge?
Leveraging AWS services
Case study: Toyota Connected Data Services
Alternative scenarios
Optimizing for inference at the edge
Getting started
The document discusses the evolution of cloud computing from its early conceptualization to its current form. It explores how cloud computing has progressed from an undefined concept to widespread ubiquity due to increasing demand and continuous improvements by suppliers. Key factors that have driven this transition include the commoditization of infrastructure, the delivery of software and platforms as standardized services, and a shift towards viewing these resources as utilities rather than custom-built products.
The Certified Cloud Computing Associate (CCCA) program is designed to provide knowledge, skills, competency and expertise to IT professionals
Find out More : https://globalicttraining.com
Cheryl Wiebe - Advanced Analytics in the Industrial WorldRehgan Avon
2018 Women in Analytics Conference
https://www.womeninanalytics.org/
Cheryl will talk about her consulting practice in Industrial Solutions, Analytic solutions for industrial IoT-enabled businesses, including connected factory, connected supply chain, smart mobility, connected assets. Her path to this practice has bounced between hands on systems development, IT strategy, business process reengineering, supply chain analytics, manufacturing quality analytics, and now Industrial IoT analytics. She spent time working in industry as a developer, as a management consultant, started and sold a company, before settling in to pursue this topic as a career analytics consultant. Cheryl will shed light on what's happening in industrial companies struggling to make the transition to digital, what that means, and what barriers they're challenged with. She'll touch on how/where artificial intelligence, deep learning, and machine learning technologies are being used most effectively in industrial companies, and what are the unique challenges they are facing. Reflecting on what's changed over the years, and her journey to witness this, Cheryl will pose what she considers important ideas to consider for women (and men) in pursuing an analytics career successfully and meaningfully.
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
In this presentation I do a review of the architecture of an AI application for IoT environments.
Since specific modeling and training aspects also have an impact on the final implementation of an enterprise ready solution, such solutions become very complex pretty soon.
The complexity of AI system for IoT is a big challenge – thus, I want to break this complexity down into particular views, which emphasize the individual but still interconnected aspects more clearly.
More Related Content
Similar to Improving computer vision models at scale (Strata Data NYC)
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
Learn more about how data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models, and how organisations can accelerate machine learning from research to production, while preserving the flexibility and agility data scientists and modern business use cases demand.
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.
The document discusses artificial intelligence (AI) and data analytics opportunities in the oil and gas industry. It provides examples of how AI can be used for virtual assistants, video analytics, fleet management, production optimization, preventative maintenance, and precision drilling. The benefits of these applications include increased profitability, risk mitigation, improved agility, and cost reductions. It also discusses Dell Technologies AI and HPC solutions that can enable oil and gas companies to leverage AI for applications like reservoir modeling and autonomous vehicles.
The document discusses the evolving concept of cloud computing, comparing it to historical notions of computer utilities. It describes cloud computing as having 3 service models (SaaS, PaaS, IaaS) and 4 deployment models (private, public, hybrid, community clouds) with 5 essential characteristics including on-demand access and elastic scaling. However, the exact definition of cloud computing remains unclear as the technology continues to change.
The document discusses several concepts related to cloud computing including:
- The evolving definition of cloud computing with 3 service models and 4 deployment models and 5 essential characteristics
- Diffusion of innovation theory and how new technologies are adopted over time through innovators, early adopters, early majority, late majority, and laggards
- The concept of ubiquity and how innovations progress from niche to mainstream as they become more established, standardized, and commoditized over time
The document discusses Oracle's autonomous database technology. It summarizes that autonomous databases can self-drive, self-repair, and self-secure with reduced human labor. Machine learning is used to continuously optimize databases and adapt to changing workloads. This allows DBAs to focus on higher value tasks like innovation rather than maintenance operations. Oracle's autonomous database is presented as the world's first fully autonomous database.
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
This document discusses an approach to enterprise metadata integration using a multilayer metadata model. Key points include:
- Status dashboards provide facts from technical, operational, application, and quality metadata layers
- A graph database allows for context exploration across the entire cluster
- The integration of metadata from multiple sources provides a more holistic view of business knowledge
This document summarizes a keynote talk given by Gurvinder Singh Ahluwalia (Guri), CTO of Cloud Computing at IBM, about IBM's cloud portfolio. The talk discusses how technology disruptions like cloud computing, mobile devices, and big data are impacting businesses. It outlines IBM's response to help clients think about, build, and tap into cloud solutions through its end-to-end cloud portfolio including public, private and hybrid cloud offerings. Specific examples discussed include workload analysis and migration to cloud, high performance computing, and IBM's acquisition of SoftLayer to gain infrastructure as a service capabilities.
The document discusses IBM's AI tools and capabilities. It summarizes IBM's suite of AI products including Watson Studio, Watson Machine Learning, Watson OpenScale, and the Watson Knowledge Catalog which help with data preparation, building and training models, deploying and managing models, and ensuring trusted AI. It also discusses IBM's strategy around automating the AI lifecycle through capabilities like transfer learning, neural network search, and AutoAI experiments.
This document provides an overview of several cloud simulation tools: CloudSim, CloudAnalyst, GreenCloud, and iCanCloud. CloudSim enables modeling and simulation of cloud computing infrastructures and applications. CloudAnalyst focuses on simulating large-scale cloud applications and studying their behavior under different deployment configurations using a graphical user interface. GreenCloud extends the NS2 network simulator to enable energy-aware cloud computing simulations at the packet level. iCanCloud allows modeling both existing and non-existing cloud architectures through a flexible hypervisor module and graphical interface to simulate distributed systems.
Building Microservices in the cloud at AutoScout24Christian Deger
The document summarizes the transformation of AutoScout24 from a monolithic architecture hosted on-premises to a microservices architecture hosted on AWS. It discusses the goals of breaking into autonomous teams organized around business capabilities. It outlines the architectural principles adopted such as shared nothing, event sourcing, and infrastructure as code. It also covers how the teams are organized to support a DevOps culture and continuous delivery.
How to build containerized architectures for deep learning - Data Festival 20...Antje Barth
When it comes to AI data scientists/engineers tend to focus on tools. Though the data platform that enables these tools is equally important, it’s often overlooked. In fact, 90% of the effort required for success in ML is not the algorithm – it’s the data logistics. In this workshop we will talk about common architecture blueprints to integrate AI in your data centers and how the right data platform choice can make all the difference in launching your AI use case into production! Presented at Data Festival Munich, 2019.
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge Julien SIMON
Machine learning at the edge?
Leveraging AWS services
Case study: Toyota Connected Data Services
Alternative scenarios
Optimizing for inference at the edge
Getting started
The document discusses the evolution of cloud computing from its early conceptualization to its current form. It explores how cloud computing has progressed from an undefined concept to widespread ubiquity due to increasing demand and continuous improvements by suppliers. Key factors that have driven this transition include the commoditization of infrastructure, the delivery of software and platforms as standardized services, and a shift towards viewing these resources as utilities rather than custom-built products.
The Certified Cloud Computing Associate (CCCA) program is designed to provide knowledge, skills, competency and expertise to IT professionals
Find out More : https://globalicttraining.com
Cheryl Wiebe - Advanced Analytics in the Industrial WorldRehgan Avon
2018 Women in Analytics Conference
https://www.womeninanalytics.org/
Cheryl will talk about her consulting practice in Industrial Solutions, Analytic solutions for industrial IoT-enabled businesses, including connected factory, connected supply chain, smart mobility, connected assets. Her path to this practice has bounced between hands on systems development, IT strategy, business process reengineering, supply chain analytics, manufacturing quality analytics, and now Industrial IoT analytics. She spent time working in industry as a developer, as a management consultant, started and sold a company, before settling in to pursue this topic as a career analytics consultant. Cheryl will shed light on what's happening in industrial companies struggling to make the transition to digital, what that means, and what barriers they're challenged with. She'll touch on how/where artificial intelligence, deep learning, and machine learning technologies are being used most effectively in industrial companies, and what are the unique challenges they are facing. Reflecting on what's changed over the years, and her journey to witness this, Cheryl will pose what she considers important ideas to consider for women (and men) in pursuing an analytics career successfully and meaningfully.
Similar to Improving computer vision models at scale (Strata Data NYC) (20)
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
In this presentation I do a review of the architecture of an AI application for IoT environments.
Since specific modeling and training aspects also have an impact on the final implementation of an enterprise ready solution, such solutions become very complex pretty soon.
The complexity of AI system for IoT is a big challenge – thus, I want to break this complexity down into particular views, which emphasize the individual but still interconnected aspects more clearly.
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
This document discusses analyzing network traffic patterns in Hadoop clusters. Packet capture data was collected from example Hadoop workloads and analyzed using Gephi. Initial results show the network structure and communication between nodes for batch processing (TeraSort) and real-time streaming (Twitter collection). Further analysis aims to classify components, understand dependencies, and identify anomalies over time to better understand typical and atypical workload behavior.
Etosha - Data Asset Manager : Status and road mapDr. Mirko Kämpf
The document provides an overview and roadmap for the first release of an open data asset manager called Etosha MDS. Key points include:
- Etosha MDS will expose metadata about datasets to enable discovery, exploration, and risk analysis of data assets.
- The first release will focus on collecting and exposing schema, statistics, and semantic annotations about datasets using tools like SPARQL and a graph browser.
- Future releases will integrate datasets across Hadoop clusters using a shared semantic knowledge graph and dataset integration layer following the data as a service paradigm.
From Events to Networks: Time Series Analysis on ScaleDr. Mirko Kämpf
Event processing, time series aggregation and analysis, and finally analysis of structural patterns between those data snippets can all be done on Hadoop clusters on huge data volumes.
In order to find hidden relations and invisible structures one has to combine three disciplines using a variety of tools. Luckily, the Hadoop ecosystem offers many of such tools. In this session you can see practical examples and a demonstration of the "Hadoop-Oscilloscope". Generic analysis patterns and recommendations towards selection of appropriate algorithms will also provide additional background.
This document provides an overview of Apache Spark, including:
- Apache Spark is a next generation data processing engine for Hadoop that allows for fast in-memory processing of huge distributed and heterogeneous datasets.
- Spark offers tools for data science and components for data products and can be used for tasks like machine learning, graph processing, and streaming data analysis.
- Spark improves on MapReduce by being faster, allowing parallel processing, and supporting interactive queries. It works on both standalone clusters and Hadoop clusters.
This document provides an overview of Apache Spark, including:
- Apache Spark is a next generation data processing engine for Hadoop that allows for fast in-memory processing of huge distributed and heterogeneous datasets.
- Spark offers tools for data science and components for data products and can be used for tasks like machine learning, graph processing, and streaming data analysis.
- Spark improves on MapReduce by being faster, allowing parallel processing, and supporting interactive queries. It works on both standalone clusters and Hadoop clusters.
This document summarizes research on simulating the spread of information during pedestrian evacuations. The researchers developed an agent-based model combining a lattice structure with social networks to represent pedestrian movement and information flow. Simulation results showed that information spread has a strong dependence on pedestrian density but is less influenced by building structure. The researchers also investigated how information quality evolves over time and space during an evacuation.
Information Spread in the Context of Evacuation OptimizationDr. Mirko Kämpf
The document describes a simulation of evacuation from a building using an agent-based model. Agents represent individuals, groups, or people with communication devices. The simulation analyzes how information spreads during evacuation and compares results between open and restricted geometries. Statistical analysis methods are applied to detect phases or transitions in the system. The impacts of different communication technologies and evacuation strategies are also studied. The goal is to define requirements for communication networks and sensors to optimize the evacuation process based on the simulation results.
(1) The document discusses challenges of managing large and complex datasets for interdisciplinary research projects. It presents Hadoop and the Etosha data catalog as solutions.
(2) Etosha aims to publish and link metadata about datasets to enable discovery and sharing across distributed research clusters. It focuses on descriptive, structural and administrative metadata rather than just technical metadata.
(3) Etosha's architecture includes a distributed metadata service and context browser that can query metadata from different Hadoop clusters to support federated querying and subquery delegation.
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
Since the numbers of hypertext pages and hyperlinks in the WWW have been continuously growing for more than 20 years, the problem of finding relevant content has become increasingly important. We have developed and evaluated techniques for a time-dependent characterization of the global and local relevance of WWW pages based on document length, number of links, and cross-correlations in user-access time series. We focus on content and user activity in selected groups of Wikipedia articles as a first application mainly because of data availability. Our goal is the assignment of ranking values to a hypertext page
(node). The values shall cover static properties of the node and its neighbourhood (context) as well as dynamic properties derived from its page-view rates that depend on underlying communication processes. We show in several examples how this goal can be achieved.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!