This is a deep learning presentation based on Deep Neural Network. It reviews the deep learning concept, related works and specific application areas.It describes a use case scenario of deep learning and highlights the current trends and research issues of deep learning
An introductory course on building ML applications with primary focus on supervised learning. Covers the typical ML application cycle - Problem formulation, data definitions, offline modeling, platform design. Also, includes key tenets for building applications.
Note: This is an old slide deck. The content on building internal ML platforms is a bit outdated and slides on the model choices do not include deep learning models.
Build a Recommendation Engine using Amazon Machine Learning in Real-timeAmazon Web Services
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. In this session, we will introduce how to use Amazon Machine Learning to create a data model, and use it to generate the real-time prediction for your application.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
This is a deep learning presentation based on Deep Neural Network. It reviews the deep learning concept, related works and specific application areas.It describes a use case scenario of deep learning and highlights the current trends and research issues of deep learning
An introductory course on building ML applications with primary focus on supervised learning. Covers the typical ML application cycle - Problem formulation, data definitions, offline modeling, platform design. Also, includes key tenets for building applications.
Note: This is an old slide deck. The content on building internal ML platforms is a bit outdated and slides on the model choices do not include deep learning models.
Build a Recommendation Engine using Amazon Machine Learning in Real-timeAmazon Web Services
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. In this session, we will introduce how to use Amazon Machine Learning to create a data model, and use it to generate the real-time prediction for your application.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source. As well Microsoft have recently released Azure Time Series Insights - cloud offering of a TS DB with the usability promises from the Microsoft brand.
This session is about managing and understanding IoT data.
Being able to analyze data in real-time will be a very hot topic for sure in near future. Not only for IoT-related tasks but as a general approach to user-to-machine or machine-to-machine interaction. From product recommendations to fraud detection alarms, a lot of stuff would be perfect if it could happen in real time. Now, with Azure Event Hubs and Stream Analytics, it’s possible. In this session, Davide will demonstrate how to use Event Hubs to quickly ingest new real-time data and Stream Analytics to query on-the-fly data, in order to do a real-time analysis of what’s happening right now.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
A brief introduction to Pattern Recognition. Slides were used for a Seminar at the Interactive Art PhD at School of Arts of the UCP, Porto, Portugal (http://artes.ucp.pt)
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this webinar, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Learning Objectives:
• Learn about the capabilities and features of Amazon Athena
• Understand the different use cases
• Describe how to run queries and options to store and visualize results
• Understand integration with other AWS big data services such as Amazon QuickSight
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Edureka!
***** Splunk Training: https://www.edureka.co/splunk *****
This tutorial on Splunk Architecture will help you understand the various components that make up a Splunk distributed cluster and most importantly, it will give a detailed explanation of the architecture of Splunk.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source. As well Microsoft have recently released Azure Time Series Insights - cloud offering of a TS DB with the usability promises from the Microsoft brand.
This session is about managing and understanding IoT data.
Being able to analyze data in real-time will be a very hot topic for sure in near future. Not only for IoT-related tasks but as a general approach to user-to-machine or machine-to-machine interaction. From product recommendations to fraud detection alarms, a lot of stuff would be perfect if it could happen in real time. Now, with Azure Event Hubs and Stream Analytics, it’s possible. In this session, Davide will demonstrate how to use Event Hubs to quickly ingest new real-time data and Stream Analytics to query on-the-fly data, in order to do a real-time analysis of what’s happening right now.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
A brief introduction to Pattern Recognition. Slides were used for a Seminar at the Interactive Art PhD at School of Arts of the UCP, Porto, Portugal (http://artes.ucp.pt)
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3.
In this webinar, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
Learning Objectives:
• Learn about the capabilities and features of Amazon Athena
• Understand the different use cases
• Describe how to run queries and options to store and visualize results
• Understand integration with other AWS big data services such as Amazon QuickSight
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Edureka!
***** Splunk Training: https://www.edureka.co/splunk *****
This tutorial on Splunk Architecture will help you understand the various components that make up a Splunk distributed cluster and most importantly, it will give a detailed explanation of the architecture of Splunk.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/live.htm?&webinarId=67
In this presentation, you will learn all you need to know about Elasticsearch, one of the most widely used open source search platforms in the world. We will walk you through what Elasticsearch is, why you need it, and show common use cases. First, we will introduce Elastic Search and the best practices for deploying it, as well as show what some of the salient features of the platform are. In the second part of the webinar, we delve into the various use cases for Elasticsearch and show why it is an excellent platform to query a large dataset. This includes a demo on querying a cluster. Finally, we will show how you can launch an elastic cluster on Alibaba Cloud and how to use Elasticsearch to query a large dataset for an autocomplete use case.
Learn more about Alibaba Cloud’s Elasticsearch offering:
https://www.alibabacloud.com/product/elasticsearch
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
The audience will get to learn how to use ElasticSearch efficiently and reliably, when data scales up in their applications. It will be about tuning your ElasticSearch and configuring ElasticSearch internal queues and buffers for heavy indexing. Another takeaway will be some insight to internals of ElasticSearch.
Fusion 3.1 comes with exciting new features that will make your search more personal and better targeted. Join us for a webinar to learn more about Fusion's features, what's new in this release, and what's around the corner for Fusion.
RDBMS gave us table schemas. A table schema, which is an essential metadata component, gave us the power to validate data types, and enforce constraints. In the age of varying data and schema-less data stores, how can we enforce these rules and how can we leverage metadata (even in RDBMS) to empower data validity, code checks, and automation.
This is a brief background into Big data (data lake) to put in context the importance of metadata from a governance perspective and more especially in todays heterogeneous big data platforms.
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
Big data tools are challenging to combine into a larger application: ironically, big data applications themselves do not tend to scale very well. These issues of integration and data management are only magnified by increasingly large volumes of data.
Apache Spark provides strong building blocks for batch processes, streams and ad-hoc interactive analysis. However, users face challenges when putting together a single coherent pipeline that could involve hundreds of transformation steps, especially when confronted by the need of rapid iterations.
This talk explores these issues through the lens of functional programming. It presents an experimental framework that provides full-pipeline guarantees by introducing more laziness to Apache Spark. This framework allows transformations to be seamlessly composed and alleviates common issues, thanks to whole program checks, auto-caching, and aggressive computation parallelization and reuse.
IoT Stream: A Lightweight Ontology for Internet of Things Data Streams (GIoTS...Tarek Elsaleh
Abstract: In recent years, the development and deployment
of Internet of Things (IoT) devices has led to the generation of
large volumes of real world data. Analytical models can be used to
extract meaningful insights from this data. However, most of IoT
data is not fully utilised, which is mainly due to interoperability
issues and the difficulties to analyse data collected by heterogeneous
resources. To overcome this heterogeneity, semantic
technologies are used to create common models to share various
data originated from heterogeneous sources. However, semantics
add further overhead to data delivery, and the processing time
to annotate the data with the model can increase the latency and
complexity in publishing and querying the annotated data. In
this paper, we present a lightweight semantic model to annotate
IoT streams. The metadata descriptions that are provided in the
models are used for search and discovery of the data using various
attributes such as value and type. The proposed model extends
commonly used ontologies such as W3C/OGC SSN ontology
and its recent lightweight core, SOSA, and includes concepts
to describe streaming IoT data. We also show use cases, tools
and applications where the proposed model has been used.
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
Elasticsearch provides native integration with Apache Spark through ES-Hadoop. However, especially during development, it is at best cumbersome to have Elasticsearch running in a separate machine/instance. Leveraging Spark Cluster with Elasticsearch Inside it is possible to run an embedded instance of Elasticsearch in the driver node of a Spark Cluster. This opens up new opportunities to develop cutting-edge applications. One such application is Dataset Search.
Oscar will give a demo of a Dataset Search Engine built on Spark Cluster with Elasticsearch Inside. Motivation is that once Elasticsearch is running on Spark it becomes possible and interesting to have the Elasticsearch in-memory instance join an (existing) Elasticsearch cluster. And this in turn enables indexing of Datasets that are processed as part of Data Pipelines running on Spark. Dataset Search and Data Management are R&D topics that should be of interest to Spark Summit East attendees who are looking for a way to organize their Data Lake and make it searchable.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.BlackboardEMEA
Presenter: Machiels Wim
Organisation: KU Leuven
Description: Search engines have become an essential tool in our daily lives and anno 2016 we cannot imagine a system - which stores a large amount of documents - that is not searchable. Unfortunately Blackboard Learn doesn’t offer this functionality yet and it hasn’t been on the companies roadmap for years. Since students and staff at KU Leuven requested a search engine for their LMS for many years we decided to develop this functionality in-house.
In this talk we will demo our solution and show how we can index all content of a large Blackboard Learn deployment and provide personalized search results for all our users.
State of Florida Neo4j Graph Briefing - Cyber IAMNeo4j
Identity is based on relationships. Graph databases ensure those connections are current, scoped to actual requirements, and secure. David Rosenblum will discuss how customers from large financial institutions to smart home security systems are IAM enabled with Neo4j.
Similar to Nicola Pagni - Anomaly Detection in Elasticsearch (20)
Presentato al sesto WebMeetup del Machine Learning / Data Science Meetup Roma: https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/273089965/
Presentazione per il sesto WebMeetup del Machine Learning / Data Science Meetup Roma: https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/273089965/
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...MeetupDataScienceRoma
Original presentation available on GitHub: https://pgaleone.eu/tf-function-talk/
Meetup: https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/264338606/
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)MeetupDataScienceRoma
Presentazione dal Meetup del Machine Learning / Data Science Meetup di Roma - Giugno 2019:
https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/262120815/
Presentazione dal Meetup del Machine Learning / Data Science Meetup di Roma - Giugno 2019:
https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/262120815/
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)MeetupDataScienceRoma
Talk dal Meetup del Machine Learning / Data Science Meetup di Roma - Giugno 2019:
https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/262120815/
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
5. Elasticsearch
• Distributed data storage & information retrieval platform
• Search engine / aggregation engine / analytics
• Suggestions / percolation / highlighting / geo
• Document store
• Horizontally distributed
• Real time & near real time
• Apache License
• Proprietary plugins available for security & messaging
• Open Source
6. Elasticsearch
• Simple RESTful API
• Fast document retrieval
• Versatile - Many popular use cases:
• Log data analysis
• Full text search
• Analytics & Aggregations
• Data visualization w/ Kibana
• Alerting & classification w/ Percolator
• Suggestion engine
7. Elasticsearch
A Cluster is the largest container in Elasticsearch
A Node is a running instance of Elasticsearch
An Index is a lightweight container for data
A Document can be “indexed” and the results go into an “index”…It
is then searchable
A Shard is a single piece of an Elasticsearch index
…some key terms:
8. Elasticsearch
RESTful API to interact with Elasticsearch
GET - PUT - POST - HEAD -
DELETE
Manage Elasticsearch through API calls
Get and set configurations, mappings, templates, aliases, and index
setting
14. Elastic Stack
Anomaly Detection
Autonomous cars Voice Recognition
Fraud detection
Speech Recognition
Language Translation Entity Resolution
Predictive Medicine
Learn to Rank
RecommendationsImage Classification
15. Terminology
• Machine Learning
‒ Broad term, but X-Pack Machine Learning is automated anomaly detection for time-series data
(for now).
• Anomaly Detection
‒ Discovery of what’s “weird” or “different”, not what’s “bad”
• Unsupervised Learning
‒ Learning without human-labeled examples (without being “taught”)
• Bayesian
‒ An approach based on probability in which prior results are used to calculate probabilities of
certain present or future events
17. How does anomaly detection work end to end?
analysis baseline models
index
3. persist baselines
1. get historical data
2. auto-create baselines
now=T
20. Anomaly Detection Complements Rules
• Rules are not great for defining normal / unusual
• Rules for “normalcy” don’t evolve with data / infrastructure
What’s the right threshold ?
24. Anomalies in temporal pattern
•Single (univariate) time series
Example: Is there unusual traffic on website ?
25. Anomalies in temporal pattern
•Multiple time series
‒Multiple metrics
‒Single metric split by a field;
•Each series modeled
independently
Example:
Is there unusual web activity
from any country?
Time
Metric
26. IT Operational Analytics
Using rules & queries
Get notified when:
• Free disk space goes below 5%
• Elasticsearch cluster health is red
Using anomaly detection
Get notified when:
• Unexpected spike in error rates
• Unusual server CPU activity
Use Case
27. Security Analytics
Using rules & queries
Get notified when:
• > 5 failed logins on a machine in 5 min
• Process X starts on any server
Using anomaly detection
Get notified when:
• Login at unusual time / location
• Anomalous outbound data transfer
Use Case
28. Application Monitoring
Using rules & queries:
Get notified when:
• App response time exceeds SLA
• Active connections exceed threshold
Using anomaly detection
Get notified when:
• Unusual spike/dip in inactive users
• Sudden drop in app performance
Use Case
29. Marketing Analytics
Using rules & queries
Get notified when:
• Weekly activity for a user drops > 30%
• Activity on flagged account
Using anomaly detection
Get notified when:
• Sudden spike in visitors from a city
Use Case