Machine Learning meets Granular Computing: the emergence of granular models in the Big Data era
** Presentation Slides from Dr Rafael Falcon, from Larus Technologies, for the February 2018 Ottawa Machine Learning & Artificial Intelligence Meetup
Abstract
Traditional Machine Learning (ML) models are unable to effectively cope with the challenges posed by the many V’s (volume, velocity, variety, etc.) characterizing the Big Data phenomenon. This has triggered the need to revisit the underlying principles and assumptions ML stands upon. Dimensionality reduction, feature/instance selection, increased computational power and parallel/distributed algorithm implementations are well-known approaches to deal with these large volumes of data.
In this talk we will introduce Granular Computing (GrC), a vibrant research discipline devoted to the design of high-level information granules and their inference frameworks. By adopting more symbolic constructs such as sets, intervals or similarity classes to describe numerical data, GrC has paved the way for a more human-centric manner of interacting with and reasoning about the real world. We will go over several granular models that address common ML tasks such as classification/clustering and will outline a methodology to appropriately design information granules for the problem at hand. Though not a mainstream concept yet, GrC is a promising direction for ML systems to harness Big Data.
INTELLIGENT TRANSPORTATION SYSTEM(ITS) PRESENTATION Mr. Lucky
It is a brief presentation on the topic of INTELLIGENT TRANSPORTATION SYSTEM(ITS). This is made by final year students of civil branch pursuing their B.tech. from Abdul Kalam Technical University.
In this presentation we try to include the basic methodologies and emerged technologies now a days in transportation system, and also the new concepts of blind turn safety and Spikes on roads at Traffic Signals.
Artificial intelligence in transportation systemPoojaBele1
A presentation to show the use of artificial intelligence in transportation system.
Artificial Intelligence makes the transportation system more easier.
This presentation contains points to be studies in this field.
INTELLIGENT TRANSPORTATION SYSTEM(ITS) PRESENTATION Mr. Lucky
It is a brief presentation on the topic of INTELLIGENT TRANSPORTATION SYSTEM(ITS). This is made by final year students of civil branch pursuing their B.tech. from Abdul Kalam Technical University.
In this presentation we try to include the basic methodologies and emerged technologies now a days in transportation system, and also the new concepts of blind turn safety and Spikes on roads at Traffic Signals.
Artificial intelligence in transportation systemPoojaBele1
A presentation to show the use of artificial intelligence in transportation system.
Artificial Intelligence makes the transportation system more easier.
This presentation contains points to be studies in this field.
defination of intersection and also different types of intersection with diagrams and also it consists of advantages and dis advantages of at grade intersection and grade separate intersection and also planning considerations and need of construction of intersection. it also consists of case study of chennai.
Transportation involves the movement of people and the shipment of goods from one location to another.
A geospatial model of a transportation network is comprised of linear features and the points of intersection between them.
Intelligent Transportation System ModifiedDurgesh Mishra
The term Intelligent Transportation Systems (ITS) refers to information and communication technology, applied to transport infrastructure and vehicles, that improve transport outcomes such as:
Transport Safety
Transport Productivity
Travel Reliability
Informed Travel Choices
Social Equity
Environmental Performance
Network Operation Resilience.
Intelligent transport system is a system which are used to decrease the problem of traffic congestion and also to improve the safety of the road users.
ATMS was introduced as an integrated traffic management and rescue console. The traffic management and rescue console, under the leadership of the ATMS control center, is intended to introduce an automated check-list based approach to ensure an integrated and efficient service delivery to the various stakeholders to prevent accidents.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
defination of intersection and also different types of intersection with diagrams and also it consists of advantages and dis advantages of at grade intersection and grade separate intersection and also planning considerations and need of construction of intersection. it also consists of case study of chennai.
Transportation involves the movement of people and the shipment of goods from one location to another.
A geospatial model of a transportation network is comprised of linear features and the points of intersection between them.
Intelligent Transportation System ModifiedDurgesh Mishra
The term Intelligent Transportation Systems (ITS) refers to information and communication technology, applied to transport infrastructure and vehicles, that improve transport outcomes such as:
Transport Safety
Transport Productivity
Travel Reliability
Informed Travel Choices
Social Equity
Environmental Performance
Network Operation Resilience.
Intelligent transport system is a system which are used to decrease the problem of traffic congestion and also to improve the safety of the road users.
ATMS was introduced as an integrated traffic management and rescue console. The traffic management and rescue console, under the leadership of the ATMS control center, is intended to introduce an automated check-list based approach to ensure an integrated and efficient service delivery to the various stakeholders to prevent accidents.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
We highlight a few trends of massive data that are available for corporations, government agencies and researchers and some examples of opportunities that exist for turning this data into knowledge. We provide a brief overview of some of the state-of-the-art technologies in the massive data analysis landscape. Then, we describe two applications from two diverse areas in detail: recommendations in e-commerce, link discovery from biomedical literature. Finally, we present some challenges and open problems in the field of massive data analysis.
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2]connectomics, complex physics simulations,[3] and biological and environmental research.[4]
Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks.[5]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-6" [6]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-7" [7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data were created;[9] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[10]
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebook[11] that can handle the available data set.
Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".[12] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]
Contents
· 1 Definition
· 2 Characteristics
· 3 Architecture
· 4 Technologies
· 5 Applications
· 5.1 Government
· 5.1.1 United States of America
· 5.1.2 India
· 5.1.3 United Kingdom
· 5.2 International development
· 5.3 Manufacturing
· 5.3.1 Cyber-Physical Models
· 5.4 Media
· 5.4.1 Internet of Things (IoT)
· 5.4.2 Technology
· 5.5 Private sector
· 5.5.1 Retail
· 5.5.2 Retail Banking
· 5.5.3 Real Estate
· 5.6 Science
· 5.6.1 Science and Resear ...
The FAIR data movement and 22 Feb 2023.pdfAlan Morrison
To realize the promise of FAIR data, companies must be data mature. They must adopt data-centric architecture and the #FAIR (findable, accessible, interoperable and reusable) principles. When they do, the data they need will be linked and self-describing. The data when queried will tell you where it is.
A desiloed, #semantic graph data abstraction--the only feasible means behind creating FAIR data at this point--is not only the means to data discovery, but also a path to model-driven development and data sharing at scale, both of which will break an organization's habit of duplicating data and logic.
This webinar highlights fresh enterprise case studies that are starting to realize the dream of #FAIRdata, as well as how these companies are succeeding:
- Zero copy integration: How to think about eliminating #dataduplication and stop the application buying binge that only exacerbates the problem.
- Dynamic, unified data model: Standard graphs provide a means of modeling once, use anywhere, for conceptual, logical and physical purposes all at once.
- Persuasion and teamwork: The #graph approach provides an ideal way to loop business units and domain experts in and empower them to recommend model changes that are easily implemented.
The whole process is bringing #enterprises like Walmart, Uber, Goldman Sachs and Nokia into the age of #contextualcomputing. Learn how to be a fast follower by thinking big, but starting small.
This presentation offers a basic understanding of Big Data. It does this by defining Big Data, offers a History of Big Data, Big Data by the Numbers and the 8 Laws of Big Data
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
Microservices are an effective approach to orchestrate services in the cloud. The microservices architectural style is an approach to develop a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms ( API ).
To be more effective they need a contextual evaluation of the meaning of data of IoT generating always more data.Machine Learning can support Microservices to extract meaning from Big Data making Microservices smarter and speedier. Industries can have huge benefits from this approach.
A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.
A presentation for the Ottawa AI/ML Meetup Group, Sept 9, 2019.
A person’s Facebook newsfeed from their vacation is to the real-world as Kaggle competitions are to industrial ML/AI projects. This presentation shares with the audience some common challenges in ML/AI that real-world practitioners face while doing their craft, along with suggestions on how to address them. This is a non-technical presentation but does assume some knowledge of data science process and ML/AI.
Machine learning applications in clinical brain computer interfacingJenny Midwinter
Brain computer interfaces (BCIs) provide alternative communication channels from the brain to external devices for severely disabled individuals and can be used to induce and guide adaptive plasticity for recovery after central nervous system trauma. Clinical BCI effectiveness depends on robust and accurate modeling of the relationship between brain signals and behaviour. Dr. Boulay will give a brief survey of BCI technologies and discuss common BCI paradigms and implementations, with an emphasis on clinical BCI brain signals and machine-learning applications.
* From the Ottawa AI/ML Meetup June 2018.
Augmented Intelligence Bridging the Gap Between BI and AIJenny Midwinter
In this session Matt will give an overview of the Qlik Cognitive Engine and Qlik’s vision for Augmented Intelligence. Using Artificial Intelligence and Machine Learning the Qlik Cognitive Engine help analytics users find insight in their data faster in a self-service framework. Matt will give highlights of the Augmented Intelligence approach and implementation and share the technical journey from research to delivery of the functionality.
** From the Ottawa AI/ML Meetup November 26, 2018.
Introduction to Natural Language Processing. Review of techniques , both classic (legacy) and recent deep learning. Presentation from the Ottawa Machine Learning and Artificial Intelligence Meetup, April 2018.
Instructions on how to build a Natural Language Processing Deep Neural Network in 5 minutes. Presentation from the Ottawa Machine Learning and Artificial Intelligence Meetup, April 2018.
Presentation slides from the Sept, 2016 Ottawa Machine Learning Meetup "Catching Bad Guys with Math: Real World Data Science Use Cases for Cyberattack Detection and Prevention", by Stephan Jou.
Ottawa Machine Learning Meetup April 24, 2017 Slides.
Abstract:
Amazon is one of the world's fastest growing companies and is showing no signs of slowing down. This staggering growth often presents serious challenges to the company's ability to scale. In the past, this has tested (and occasionally crushed) conventional software systems, but now this is testing the human decision makers in areas from Product Recommendation to Produce Inspection, from Text Classification to Voice Assistants and more. Matthew Spencer, a Machine Learning Engineer at Amazon will discuss these challenges and how they are being met by a wide variety of Machine Learning applications.
AI and Machine Learning: The many different approachesJenny Midwinter
Ottawa Machine Learning Meetup slides for event March 20, 2017, A whirlwind tour of AI and Machine Learning: When to use which technique, by Robin Grosset, Chief Technology Officer, MindBridge.Ai.
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
Slides from Ottawa Machine Learning Meetup from January 16, 2016.
Pierre Paulin, Director of R&D at Synopsys (Embedded Vision Subsystems) , will be will be making a presentation on:
“Applying Deep Learning Vision Technology to Low-Cost, Low-Power Embedded Systems: An Industrial Perspective”
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Machine Learning meets Granular Computing
1. 2/27/2018 1
Machine Learning meets Granular Computing:
the emergence of granular models in the Big Data era
Dr. Rafael Falcon, Larus Technologies & University of Ottawa
rfalcon@ieee.org
Machine Learning and Artificial Intelligence Ottawa
February 26, 2018 | Ottawa, Canada
2. Agenda
2/27/2018 2
‣Introduction to Larus Technologies
‣The Big Data Era
‣Limitations of Traditional Machine Learning Models
‣Taming the Big Data Monster: Popular Solutions
‣Granular Computing: an Introduction
‣The Emergence of Granular [Machine Learning] Models
‣Examples of Granular Models
‣Granular Classifiers
‣Granular Clustering Algorithms
‣Granular Cognitive Maps
‣Conclusions
4. What We Do
High Level Information Fusion and Predictive Analytics
Multi-source
Information Fusion
Behavioural
Learning and
Analysis
Technology
TECHNOLOGY
Research and
Engineering
Custom
Solutions
Development
ACTIVITIES
Defence and
Security
Maritime Logistics
Commercial Video
Surveillance
Analytics
EXPERIENCE
TOTAL::INSIGHT
Patented (USPTO) behavioral
learning and analysis technology
Leveraging analytical tools to make data-driven decisions
through the production of high quality and time-critical information
5. Larus’ Key Competencies
Big Data Analytics
Data Visualization
Data Infrastructure
Data Warehousing
Business Intelligence
Knowledge Discovery
Cloud and Large Scale Computing
Statistical and Quantitative Analysis
Software Engineering and Development
Resellers
White Papers
Trade Shows
Supply Channels
Product Branding
Thought Leadership
Marketing and Sales Strategy
Technical Presentations by SMEs
Machine Learning
Artificial Intelligence
Predictive Analytics
Data Mining and Fusion
Video and Text Analytics
Modeling and Simulation
Multi-Objective Optimization
Intelligent Processing Architectures
Wireless Sensor and Robot Networks
8. NSERC Maritime IoT Research Project
Product will model and optimize the entire maritime supply chain!
Scenario 1 [Freight Ship Companies]
Scenario 2 [Port Authorities]
Scenario 3 [Trucking and Insurance Companies]
9. The Big Data Era
2/27/2018 9
“Data is big when data size
becomes part of the problem”
“Big Data is the result of collecting information at its most granular
level — it’s what you get when you instrument a system and keep
all of the data that your instrumentation is able to gather.”
Jon Bruner, O’Reilly Media
14. The Increasing V’s of Big Data
2/27/2018 14
Some interesting V’s:
‣ Vanilla: Even the simplest models, constructed with rigor, can provide value.
‣ Varmint: As big data gets bigger, so can software bugs!
‣ Vivify: Data science has the potential to animate all manner of decision
making and business processes, from marketing to fraud detection.
‣ Voodoo: Data science and big data aren't voodoo, but how can we convince
potential customers of data science's value to deliver results with real-world
impact?
15. Internet of Things (IoT)
2/27/2018 15
“The interconnection via the
Internet of computing devices
embedded in everyday objects,
enabling them to send and
receive data”
16. Limitations of Traditional Machine Learning Models
‣ Volume:
‣ Training time could be computationally intractable with large data sets
‣ Velocity:
‣ Offline ML models not suitable
‣ Variety:
‣ Curse of dimensionality: the amount of data needed to support a sound
inference often grows exponentially with the dimensionality.
‣ Veracity:
‣ Most ML models do not take into account the uncertainty of the data
Outlier detection will not do
‣ Volatility:
‣ Stationarity assumption: data distribution will not change during analysis
2/27/2018 16
17. Taming the Big Data Monster: Popular Solutions
‣ Dimensionality reduction
‣ “the process of reducing the number of random variables under
consideration by obtaining a set of principal variables.” (Wikipedia)
‣ Two main manifestations:
‣ Feature selection
Select a subset of the original variables and discard the rest
Three types: filter, wrapper, embedded
‣ Feature extraction
Transforms the data in a high-dimensional space to a space of fewer dimensions.
Linear transformations (PCA, ICA, SVD, etc.)
Nonlinear transformations (autoencoders, Sammon’s mapping, Kernel PCA, etc.)
2/27/2018 17
18. Taming the Big Data Monster: Popular Solutions
‣ Instance selection
‣ “the process of reducing the number of instances (objects) in the data set.”
(Wikipedia)
‣ “the optimal outcome of IS would be the minimum data subset that can
accomplish the same task with no performance loss, in comparison with the
performance achieved when the task is performed using the whole available
data”
‣ Instance selection algorithms consider the trade-off between reduction rate and
performance degradation.
‣ Two major types:
‣ Preserve instances at the boundaries of the classes
‣ Preserve internal instances of the classes
2/27/2018 18
19. Taming the Big Data Monster: Popular Solutions
‣ Parallel/distributed architectures
2/27/2018 19
20. Taming the Big Data Monster: Popular Solutions
‣ Parallel/distributed architectures
2/27/2018 20
21. Taming the Big Data Monster: Popular Solutions
‣ Parallel/distributed architectures
2/27/2018 21
22. Taming the Big Data Monster: Popular Solutions
‣ More sophisticated algorithms (e.g. Deep Learning) that make use
of these architectures
‣ Still very numerically driven
2/27/2018 22
23. Taming the Big Data Monster: Popular Solutions
‣ Popular Deep Learning Frameworks
2/27/2018 23
25. Granular Computing: An Introduction
2/27/2018 25
‣ Three basic concepts underlying human cognition:
‣ Granulation: Decomposition of a whole into parts
‣ Organization: Integration of parts into a whole
‣ Causation: Association of causes with effects
‣ [Zadeh 1997] The granulation of an object leads to a collection of information
granules
“Informally, a granule is a clump of values of a perception (e.g., perception of age),
which are drawn together by proximity, similarity, or functionality. More
concretely, a granule may be interpreted as a restriction on the values that a
variable can take. In this sense, words in a natural language are, in large measure,
labels of granules. A linguistic variable is a variable whose values are words or,
equivalently, granules.”
26. Granular Computing: An Introduction
2/27/2018 26
‣ Granular Computing (GrC) is an umbrella term to cover any theories,
methodologies, tools and techniques that employ information granules for
problem solving purposes.
‣ An information granule is a subset of the universe.
Implicit information granules
Explicit (operational)
information granules
Humans Computer
realizations
Various points of view (models)
Fuzzy sets
Rough sets
Intervals (sets)
Clouds
Shadowed sets
Probability functions
Information granules
27. Why Granular Computing?
2/27/2018 27
1. Truthful representation of the real world
‣ GrC provides true and natural representations of multi-level systems.
2. Consistent with human thinking and problem solving
‣ Human thinking is based on levels of granularity and change between granularities
3. Simplification of problems
‣ By omitting unnecessary and irrelevant details and focusing on the right level of
abstraction
4. Economic and low-cost solutions
‣ e.g. reduced computational overhead
Yao, Yiyu. "Perspectives of granular computing" Granular Computing, 2005 IEEE International
Conference on. Vol. 1. IEEE, 2005.
28. Granular Computing in the CI/AI Family
2/27/2018 28
Granular Computing
Fuzzy sets
Neurocomputing
Evolutionary
optimization
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
29. Granular Systems
2/27/2018 29
‣ Granular Systems (GrS) is an umbrella term that is used to describe those
complex, intelligent systems that originate from general, frequently vague and
imprecise specification and employ information granularity at their basis.
‣ They use multiple granule models as building blocks and various GrC models in
to perform inference upon them.
‣ GrC became an effective framework in the design and implementation of
intelligent systems for various real life applications.
‣ The developed systems exploit the tolerance for imprecision, uncertainty and
partial truth under the Soft Computing framework, in order to achieve
tractability, robustness and resemblance with human-like (natural) decision-
making
Szczuka, Marcin, et al. "Building granular systems: from concepts to applications" Rough Sets, Fuzzy
Sets, Data Mining, and Granular Computing. Springer International Publishing, 2015. 245-255.
30. Granular Systems
2/27/2018 30
‣ Granular systems are concerned with the representation, construction, and
processing of information granules
‣ In GrC we deal with calculi of granules defined by elementary granules (e.g.,
indiscernibility or similarity classes) and some operations allowing us to
construct new granules from already defined ones by their amalgamation and
aggregation.
Szczuka, Marcin, et al. "Building granular systems: from concepts to applications" Rough Sets, Fuzzy
Sets, Data Mining, and Granular Computing. Springer International Publishing, 2015. 245-255.
31. A Granular System’s Generic Architecture
2/27/2018 31
Falcon, Rafael, et al. “A review of granular cognitive maps" , submitted to Granular Computing journal
32. Types of Information Granules: Crisp Sets (e.g., Intervals)
2/27/2018 32
https://www.calvin.edu/~pribeiro/othrlnks/Fuzzy/fuzzysets.htm
33. Types of Information Granules: Fuzzy Sets
2/27/2018 33
https://www.calvin.edu/~pribeiro/othrlnks/Fuzzy/fuzzysets.htm
34. Some Membership Functions for Fuzzy Sets
2/27/2018 34
https://www.calvin.edu/~pribeiro/othrlnks/Fuzzy/fuzzysets.htm
35. Types of Information Granules: Clustering
2/27/2018 35
https://mubaris.com/2017/10/01/kmeans-clustering-in-python/
Clustering
Algorithm
Metadata
• Prototypes
• Partition matrix
• Point density
(core, boundary,
outlier)
36. Types of Information Granules: Rough Sets
2/27/2018 36
Pawlak, Zdzislaw “Rough set theory and its applications to data analysis“, Cybernetics & Systems
29:7, 661-668, 1998
Example. Predicting the loyalty of a new customer.
Identifier Member
Score
Online
Payments
City Click rate Loyalty
customer1 68 TRUE Genk 15/20 Yes
customer2 21 FALSE Hasselt 13/20 Yes
customer3 43 TRUE Brussels 0/20 No
customer4 65 FALSE Leuven 18/20 Yes
customer5 37 FALSE Hasselt 3/20 No
customer6 68 TRUE Genk 15/20 No
customer7 29 TRUE Antwerp 10/20 ?
37. Types of Information Granules: Rough Sets
2/27/2018 37
Pawlak, Zdzislaw “Rough set theory and its applications to data analysis“, Cybernetics & Systems
29:7, 661-668, 1998
38. Types of Information Granules: Rough Sets
2/27/2018 38
Pawlak, Zdzislaw “Rough set theory and its applications to data analysis“, Cybernetics & Systems
29:7, 661-668, 1998
‣ The lower and upper approximations divide the universe of
discourse into three disjoint regions:
The positive region 𝑃𝑂𝑆 𝑋 comprises those objects certainly
related with the decision class → certainty.
The negative region 𝑁𝐸𝐺 𝑋 comprises those objects certainly
not related with the decision class → certainty.
The boundary region 𝐵𝑁𝐷 𝑋 comprises those objects possibly
related with the decision class → possibility.
39. Types of Information Granules: Fuzzy Rough Sets
2/27/2018 39
Dubois, Didier, and Henri Prade. "Rough fuzzy sets and fuzzy rough sets." International Journal of
General System 17.2-3 (1990): 191-209.
‣ The lower and upper approximations (crisp sets) in classical
RST are now modeled as fuzzy sets.
‣ Fuzzy tolerance relations replace the crisp equivalence relation
imposed on RST
‣ LowerApprox(Class = “Pass”) = {X1, X3, X4}
‣ FuzzyLowerApprox(Class = “Pass”) = (0.95, 0.2, 0.85, 0.90, 0.23)
40. Information Granulation: Main Avenues
2/27/2018 40
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
Principle of justifiable information
granularity
data single information granule
Data
Clustering
numeric data a collection of
information granules:
set-based (K-Means)
fuzzy sets (Fuzzy C-Means)
rough sets (Rough C-Means)
…
41. Designing Information Granules: Principle of Justifiable Granularity
2/27/2018 41
‣ One of the fundamental principles in Granular Computing
‣ Leads to the creation of sound information granules (IGs)
‣ Two conflicting views of an IG:
‣ Coverage: The IG should be supported by the available (often numerical) data
‣ Specificity: The IG should be specific enough, i.e., it has to exhibit a tangible
meaning
Pedrycz, Witold, and Xianmin Wang. "Designing fuzzy sets with the use of the parametric principle of
justifiable granularity" IEEE Transactions on Fuzzy Systems 24.2 (2016): 489-496.
42. ‣ These two requirements are in conflict and a compromise must be achieved
Designing Information Granules: Principle of Justifiable Granularity
2/27/2018 42
Pedrycz, Witold, and Xianmin Wang. "Designing fuzzy sets with the use of the parametric principle of
justifiable granularity" IEEE Transactions on Fuzzy Systems 24.2 (2016): 489-496.
‣ The principle of Justifiable Granularity can be formulated as an optimization
problem:
‣ generate initial representatives (IG) of the underlying data
‣ adjust the IGs (stretch, shrink) so the two requirements can be met to a
significant extent
43. Traditional Machine Learning Models
2/27/2018 43
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
44. Granular Machine Learning Models
2/27/2018 44
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
45. Granular Models: Two Key Principles
2/27/2018 45
‣ The model is built as a network of associations among information
granules.
‣ This supports interpretability of the model, which becomes easily translated
into a collection of rules with condition and conclusion parts being formed by
the constructed information granules.
‣ Information granules form conceptually sound building blocks
(supported by data) and in light of their functionality, can be used in the
formation of a variety of relationships among input and output variables.
Reyes-Galaviz, Orion and Pedrycz, Witold “Granular fuzzy models: analysis, design and
evaluation" International Journal of Approximate Reasoning 64 (2015): 1-19.
46. Example: A Granular Neural Network
2/27/2018 46
Reyes-Galaviz, Orion and Pedrycz, Witold “Granular fuzzy models: analysis, design and
evaluation" International Journal of Approximate Reasoning 64 (2015): 1-19.
intervals
clusters
47. Example: Traditional Fuzzy Rule-based System
2/27/2018 47
-if x is Ai then y = Li(x, ai), i=1, 2, …,c
Ai – fuzzy set in the input space Li(x, ai) – local model (linear)
Data-driven design process
condition parts developed through fuzzy clustering (e.g., Fuzzy C-Means, FCM)
conclusion part – estimation of parameters a1, a2, …, ac
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
48. Example: Granular Fuzzy Rule-based System
2/27/2018 48
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
-if x is G(Ai) then y = Li(x, ai), i=1, 2, …,c
G(Ai )– type-2 fuzzy set in the input space (granular prototypes)
Li(x, ai) – local model (linear)
Granular condition space
49. Example: Granular Fuzzy Rule-based System
2/27/2018 49
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
Granular conclusion space
-if x is Ai then y = Li(x, G(ai)), i=1, 2, …,c
Ai –fuzzy set in the input space
Li(x, G(ai)) – local model (linear) with granular parameters
50. Example: Granular Fuzzy Rule-based System
2/27/2018 50
Pedrycz, Witold “Granular Computing: Pursuing New Avenues of Computational Intelligence"
Granular condition and conclusion spaces
-if x is G(Ai) then y = Li(x, G(ai)), i=1, 2, …,c
G(Ai) –fuzzy set of type-2 in the input space
Li(x, G(ai)) – local model (linear) with granular parameters
52. … to Granular Decision Trees
2/27/2018 52
Balamash, A., Pedrycz, W., Al-Hmouz, R., & Morfeq, A. (2017). “Granular classifiers and their design
through refinement of information granules”. Soft Computing, 21(10), 2745-2759.
Idea: Granulate the input space by producing an initial number of granular
prototypes (IGs). Then subsequently refine these prototypes according to their
diversity until a certain homogeneity level is reached
Granular prototypes formed
via FCM-based clustering
53. Example: … to Granular Decision Trees
2/27/2018 53
Balamash, A., Pedrycz, W., Al-Hmouz, R., & Morfeq, A. (2017). “Granular classifiers and their design
through refinement of information granules”. Soft Computing, 21(10), 2745-2759.
54. Example: … to Granular Decision Trees
2/27/2018 54
Balamash, A., Pedrycz, W., Al-Hmouz, R., & Morfeq, A. (2017). “Granular classifiers and their design
through refinement of information granules”. Soft Computing, 21(10), 2745-2759.
55. Example: Granular clustering
2/27/2018 55
Rubio, Elid, and Oscar Castillo. "A new proposal for a granular fuzzy C-Means algorithm" Design of
Intelligent Systems Based on Fuzzy Logic, Neural Networks and Nature-Inspired Optimization. Springer
International Publishing, 2015. 47-57.
Goal: Granulate the output of the Fuzzy C-Means (FCM) clustering algorithm in
order to produce region-based granular prototypes representing the clusters,
not just numerical prototypes as done in FCM.
Idea: Granulation-degranulation mechanism
56. Example: Granular clustering
2/27/2018 56
Rubio, Elid, and Oscar Castillo. "A new proposal for a granular fuzzy C-Means algorithm" Design of
Intelligent Systems Based on Fuzzy Logic, Neural Networks and Nature-Inspired Optimization. Springer
International Publishing, 2015. 47-57.
Based on the reconstruction error
computed by the granulation-
degranulation mechanism,
two approximations of the original data
can be built.
(Interval-based) granular prototypes and
membership matrices can be derived
from this information
57. Example: Granular clustering
2/27/2018 57
Rubio, Elid, and Oscar Castillo. "A new proposal for a granular fuzzy C-Means algorithm" Design of
Intelligent Systems Based on Fuzzy Logic, Neural Networks and Nature-Inspired Optimization. Springer
International Publishing, 2015. 47-57.
Original FCM Granular FCM
58. Example: Granular cognitive mapping
2/27/2018 58
Goal: augment Fuzzy Cognitive Maps (FCMs) with information granules
Fuzzy cluster prototypes
Rough-set-based regions
positive, negative and boundary regions
Step 1
• Information
granulation
Step 2
• Topology
construction
Step 3
• Network
exploitation
Fuzzy sets
Rough sets
Fuzzy Cognitive Maps:
concepts, weights
Activation of the
granular concepts
59. Fuzzy Cognitive Maps
2/27/2018 59
A type of recurrent neural network denoting system concepts and their
causal connections modeled via fuzzy sets
60. Granular Time Series Modeling
2/27/2018 60
Wojciech Froelich, Witold Pedrycz (2017) “Fuzzy cognitive maps in the modeling of granular time
series”, Knowledge-Based Systems, Vol. 115, pp. 110 – 122
61. Granular FCM for Graded Multilabel Classification
2/27/2018 61
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2016)
“Partitive Granular Cognitive Maps to Graded Multilabel Classification”, 2016 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, July 24-29, 2016, pp. 1363-1370
‣ Graded Multilabel Classification (GMLC) = predict the membership
degree of an input pattern to multiple class labels
‣ Automatic construction of the granular FCM
‣ FCM input concepts = set of fuzzy cluster prototypes generated
through Fuzzy C-Means clustering
‣ FCM output concepts = set of decision classes
‣ FCM weights = learned from data using Particle Swarm Optimization
(PSO); three underlying topologies were explored
62. Granular FCM for Graded Multilabel Classification
2/27/2018 62
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2016)
“Partitive Granular Cognitive Maps to Graded Multilabel Classification”, 2016 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, July 24-29, 2016, pp. 1363-1370
GCM-1: input neurons are fully connected with each other and with each
decision label; only recurrent connections allowed in the output layer
63. Granular FCM for Graded Multilabel Classification
2/27/2018 63
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2016)
“Partitive Granular Cognitive Maps to Graded Multilabel Classification”, 2016 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, July 24-29, 2016, pp. 1363-1370
GCM-2: input neurons are fully connected with each other and with each
decision label; fully connected output layer to capture inter-label correlations
64. Granular FCM for Graded Multilabel Classification
2/27/2018 64
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2016)
“Partitive Granular Cognitive Maps to Graded Multilabel Classification”, 2016 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, July 24-29, 2016, pp. 1363-1370
GCM-3: fully connected topology
65. Granular FCM for Graded Multilabel Classification
2/27/2018 65
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2016)
“Partitive Granular Cognitive Maps to Graded Multilabel Classification”, 2016 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, July 24-29, 2016, pp. 1363-1370
‣ Experimental Results
‣ Generated GMLC datasets from UCI ML repositories via
Random Forests
‣ Performance metric: Normalized Mean Squared Error (NMSE)
‣ All three granular models perform comparably
‣ Not compared to other GMLC classifiers as they predict the ordinal
relation of labels instead of their exact membership grade
66. Rough Cognitive Networks (RCNs)
2/27/2018 66
Nápoles, Gonzalo, Isel Grau, Elpiniki Papageorgiou, Rafael Bello, and Koen Vanhoof.
"Rough cognitive networks" Knowledge-Based Systems 91 (2016): 46-61.
‣ Granular FCMs whose input concepts are derived from the three regions
in Rough Set Theory
73. Rough Cognitive Networks (RCNs)
2/27/2018 73
Nápoles, Gonzalo, Isel Grau, Elpiniki Papageorgiou, Rafael Bello, and Koen Vanhoof.
"Rough cognitive networks" Knowledge-Based Systems 91 (2016): 46-61.
‣ To activate the neurons we compute the intersection between
the similarity class 𝑅 𝑥 and each granule.
𝑅 𝑥 𝑃𝑂𝑆(𝑋 𝑘)
𝐴𝑖
0
=
𝑅 𝑥 ∩ 𝑃𝑂𝑆(𝑋 𝑘)
𝑃𝑂𝑆 𝑋 𝑘
74. Rough Cognitive Networks (RCNs)
2/27/2018 74
Nápoles, Gonzalo, Isel Grau, Elpiniki Papageorgiou, Rafael Bello, and Koen Vanhoof.
"Rough cognitive networks" Knowledge-Based Systems 91 (2016): 46-61.
‣ Experimental takeaways
‣ RCNs are capable of outperforming standard ML classifiers
‣ User intervention in the RCN learning process relegated only to the
selection of a single input parameter: the similarity threshold used to
build the similarity classes in the input space
‣ RCN topology not depending on the number of input features
Rather it depends on the number of decision classes (C << M)
75. RCN Limitations
2/27/2018 75
Nápoles, Gonzalo, Isel Grau, Elpiniki Papageorgiou, Rafael Bello, and Koen Vanhoof.
"Rough cognitive networks" Knowledge-Based Systems 91 (2016): 46-61.
‣ Building the similarity relation requires specifying the proper
granularity degree. That is to say:
‣ Evaluating a threshold value requires building the lower and
upper approximations from scratch.
𝑅: 𝑦𝑅𝑥 ⟺ 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑥, 𝑦 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
76. Two Ways To Overcome RCN Limitations
2/27/2018 76
Gonzalo Nápoles, Rafael Falcon, Elpiniki Papageorgiou, Rafael Bello and Koen Vanhoof (2017)
“Rough Cognitive Ensembles”, Int’l Journal of Approximate Reasoning, Vol 85, June 2017, pp. 79-96
Gonzalo Nápoles, Carlos Mosquera, Rafael Falcon, Isel Grau, Rafael Bello and Koen Vanhoof (2017)
“Fuzzy-Rough Cognitive Networks”, Neural Networks, to appear
High prediction rates, complex and
difficult to understand
High prediction rates, simpler and
comprehensible
Rough Cognitive Ensembles Fuzzy-Rough Cognitive Networks
77. Conclusions
2/27/2018 80
‣ Granular Computing is a promising approach to tackle the
multifaceted challenges posed by Big Data and Internet of
Things
‣ The development of granular models is still in its infancy
‣ and certainly confined to academic circles
‣ not aware of any commercially available implementation
‣ Lots to do in this field!
78. 2/27/2018 81
Thank you for your time!
Questions?
Rafael Falcon, Ph.D., SMIEEE
Research Scientist, Larus Technologies
Adjunct Professor, University of Ottawa
rfalcon@ieee.org