An evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods.
An evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods.
Del modelo manicomia al modelo comunitario: La formación del psiquiatra en el contexto de la nueva reforma de salud mental en el Perú. Curso de inducción a residentes del Instituto Nacional de Salud Mental. 2015
Social Media: Legal Pitfalls and Best Practices - SXSWedu 2016Diana Benner
Social media is here to stay but knowing how it can impact your district is critical as a leader. Join us for a candid conversation surrounding the top legal pitfalls of social media for school districts as well as best practices for implementing social media in your school district. Explore the evolution of legal decisions impacting first amendment application in schools and practical recommendations for building your social media policy.
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
Multimodal biometric system utilizes two or more character modalities, e.g., face, ear, and fingerprint,
Signature, plamprint to improve the recognition accuracy of conventional unimodal methods. We propose a new
dimensionality reduction method called Dimension Diminish Projection (DDP) in this paper. DDP can not only
preserve local information by capturing the intra-modal geometry, but also extract between-class relevant
structures for classification effectively. Experimental results show that our proposed method performs better
than other algorithms including PCA, LDA and MFA.
Geoid height determination is one of the major problems of geodesy because usage of satellite
techniques in geodesy isgetting increasing. Geoid heights can be determined using different methods according
to the available data. Soft computing methods such as Fuzzy logic and neural networks became so popular that
they are used to solve many engineering problems. Fuzzy logic theory and later developments in uncertainty
assessment have enabled us to develop more precise models for our requirements. In this study, How to
construct the best fuzzy model is examined. For this purpose, three different data sets were taken and two
different kinds (two inpust one output and three inputs one output) fuzzy model were formed for the calculation
of geoid heights in Istanbul (Turkey). The Fuzzy models results of these were compared with geoid heights
obtained by GPS/levelling methods. The fuzzy approximation models were tested on the test points.
Dual-time Modeling and Forecasting in Consumer Banking (2016)Aijun Zhang
Longitudinal and survival data are naturally observed with multiple origination dates. They form a dual-time data structure with horizontal axis representing the calendar time and the vertical axis representing the lifetime. In this talk we discuss how to model dual-time data based on a decomposition strategy and how to forecast over the time horizon. Various statistical techniques are used for treating fixed and random effects.
Among other fields, we share the potential applications in quantitative risk management, and demonstrate a large-scale credit risk analysis powered by big data computing.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
DEFINITION :
GIS is a powerful set of tools for collecting, storing , retrieving at will, transforming and displaying spatial data from the real world for a particular set of purposes
APPLICATION AREAS OF GIS
Agriculture
Business
Electric/Gas utilities
Environment
Forestry
Geology
Hydrology
Land-use planning
Local government
Mapping
11. Military
12. Risk management
13. Site planning
14. Transportation
15. Water / Waste water industry
COMPONENTS OF GIS
DATA INPUT
SPATIAL DATA MODEL
Data Model:
It describes in an abstract way how the data is represented in an information system or in DBMS
Spatial Data Model :
The models or abstractions of reality that are intended to have some similarity with selected aspects of the real world
Creation of analogue and digital spatial data sets involves seven levels of model development and abstraction
SPATIAL DATA MODEL
Conceptual model : A view of reality
Analog model : Human conceptualization leads to analogue abstraction
Spatial data models : Formalization of analogue abstractions without any conventions
Database model : How the data are recorded in the computer
Physical computational model : Particular representation of the data structures in computer memory
Data manipulation model : Accepted axioms and rules for handling the data
SPATIAL DATA MODEL
SPATIAL DATA MODEL
Objects on the earth surface are shown as continuous and discrete objects in spatial data models
Types of data models
Raster data model
vector data models
RASTER DATA MODEL
Basic Elements :
Extent
Rows
Columns
Origin
Orientation
Resolution: pixel = grain = grid cell
Ex: Bit Map Image (BMP),Joint Photographic Expert Group (JPEG), Portable Network Graphics(PNG) etc
RASTER DATA MODEL
VECTOR DATA MODEL
Basic Elements:
Location (x,y) or (x,y,z)
Explicit, i.e. pegged to a coordinate system
Different coordinate system (and precision) require different values
o e.g. UTM as integer (but large)
o Lat, long as two floating point numbers +/-
Points are used to build more complex features
Ex: Auto CAD Drawing File(DWG), Data Interchange(exchange) File(DXF), Vector Product Format (VPF) etc
VECTOR DATA MODEL
RASTER vs VECTORRaster is faster but Vector is corrector
TESSELLATIONS OF CONTINUOUS FIELDS
Triangular Irregular Network: (TIN)
TIN is a vector data structure for representing geographical information that is continuous
Digital elevation model
TIN is generally used to create Digital Elevation Model (DEM)
DIGITAL ELEVATION MODEL
DATA STRUCTURES
Data structure tells about how the data is stored
Data organization in raster data structures
Each cell is referenced directly
Each overlay Is referenced directly
Each mapping unit is referenced directly
Each overlay is separate file with general header
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Pre-computation for ABC in image analysisMatt Moores
MCMSki IV (the 5th IMS-ISBA joint meeting)
January 2014
Chamonix Mont-Blanc, France
The associated journal article has now been uploaded to arXiv: http://arxiv.org/abs/1403.4359
The aim of the study was to investigate the relationship between 2D gray scale pixels and 3D gray scale pixels of image reconstructions in computed tomography (CT). The 3D space image reconstruction from data projection was a challenging and difficult research problem. The image was normally reconstructed from the 2D data from CT data projection. In this descriptive study, a synthetics 3D Shepp-Logan phantom was used to simulate the actual data projection from a CT scanner. Real-time data projection of a human abdomen was also included in this study. Additionally, the Graphical User Interface (GUI) for the application was designed using Matlab Graphical User Interface Development Environment (GUIDE). The application was able to reconstruct 2D and 3D images in their respective spaces successfully.The image reconstruction for CT in 3D space was analyzedalong with 2D space in order to show their relationships and shared properties for the purpose of constructing these images.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
1. Generalized Notions of Data Depth
Spring 2015 Data Reading Seminar
Mukund Raj
12th Mar, 2015
1 / 25
2. Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
2 / 25
3. What is Data Depth?
A means of measuring how deep a data point p is within a
cloud of points {p1, . . . , pn}.
Multivariate data analysis approach to generate order statistics
which capture high-dimensional features and relationships.
Descriptive nonparametric method of statistical analysis.
3 / 25
4. Why is Data Depth Interesting?
Estimate the location from center outward ( with respect to
parent distribution ).
Identify outliers.
Formulate quantitative and graphical methods for analyzing
distributional characteristics such as location, scale, e.t.c as
well as hypothesis testing.
Robustness.
4 / 25
5. Various Formulations of Data Depth
Geometrical (for Data in
Euclidean Space)
L2 depth
Mahalanobis depth
Oja depth
Expected convex hull depth
Zonoid depth
Simplex depth
Half Space depth or Tukey
depth or Location depth
Generalized (for Complex Data)
Functional Band Depth
Depth for Multivariate
Curves
Sets
Paths on a Graph
5 / 25
6. Geometrical data depth
Depth based on distances / volumes
L2 depth
Mahalanobis depth
Oja depth
Depth based on weighted means
Zonoid depth
Expected Convex Hull depth
Depth based on half spaces and simplices
Tukey depth
Simplicial depth
[Mosler 2012]
6 / 25
7. General Properties of Data Depth
1 Zero at infinity
2 Maximality at Center
3 Monotonicity
4 Affine Invariance
[Zuo and Serfling, 2000]
7 / 25
8. Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
8 / 25
9. Function Ensembles
A function ensemble can be defined as:
{xi (t), i = 1, . . . , n, t ∈ I} where I is an interval in and
xi : →
Time series observations annual trend of temperature or
precipitation, prices of commodities, heights of children versus
age e.t.c.
9 / 25
10. Motivation for Functional Band Depth
Challenge with regular multivariate analysis of functions
Curve ensembles that are sampled at different points.
Curse of dimensionality in case of current methods (e.g.
PCA).
Contribution by [L´opez-Pintado et. al. 2009]
Given an ensemble of functions (sampled from a distribution),
a formulation of data depth associated with the function.
10 / 25
11. Functional Band Depth Formulation
Figure: A functional band [Lopez-Pintado et. al. 2009].
Functional band formulation:
g ⊂ B(f1, · · · , fj ) iff ∀x min
i∈{1...j}
{fi(x)} ≤ g(x) ≤ max
i∈{1...j}
{fi(x)}
(1)
Functional band depth formulation:
BDj (g) = P (g ⊂ B(f1, · · · , fj)) (2)
11 / 25
12. Visualization of Data Depth for Functions
Figure: Visualization of function
ensemble [Lopez-Pintados et. al.
2009].
Figure: Boxplot visualization of
function ensemble [Sun et. al. 2011,
Whitaker et. al. 2013].
12 / 25
13. Multivariate Curve Ensembles
A parameterized curve can be defined in terms
of an independent parameter s as:
c(s) = ˜x(s) c : D → R D ⊂ R, R ⊂ Rd
Hurricane paths.
Brain tractography data.
Pathline ensemble in fluid simulation. Figure: A synthetic
ensemble of
multivariate curves in
[Mirzargar et. al.
2014]
13 / 25
14. Data Depth Formulation for Multivariate Curves
(a) (b)
Figure: Band formed by 3 multivariate curves [Lopez-Pintado et. al.
2014, Mirzargar et. al. 2014]
Curve band formulation:
g ⊂ B(ci1 , · · · , cij
) iff ∀x g(x) ∈ simplex ci1 (x), · · · , cij (x)
(3)
Curve band depth formulation:
SBDj (g) = P g ⊂ B(fc1 , · · · , cij ) (4)
14 / 25
15. Visualization of Data Depth for Curves
Figure: Chinese Script replicated
100 times [Lopez-Pintado 2014].
Figure: Curve boxplot for hurricane
path ensemble [Mirzargar et. al.
2014]
15 / 25
16. Set / Isocontour Ensembles
Given an ensemble of real valued functions
f (x, y), the sublevel and superlevel sets for any
particular isovalue.
Isocontours of temperature field.
Isocontours of pressure field in fluid
dynamics simulations.
Figure: A synthetic
ensemble of contours
in [Whitaker et. al.
2013]
16 / 25
17. Data Depth Formulation for Sets
Figure: Examples of set band [Whitaker et. al. 2013]
Set band formulation:
S ∈ sB(S1, . . . , Dj ) ↔
j
k=1
Sk ⊂ S ⊂
j
k=1
Sk (5)
Set band depth formulation:
sBDj (S) = P (S ⊂ sB(S1, . . . , Sj ) (6)
17 / 25
18. Visualization of Data Depth for Sets
(a)
(b)
Figure: Contour boxplot for an ensemble of isocontours of pressure field
[Whitaker et. al. 2013]
18 / 25
19. Paths (on a graph)
Let G = {V , E, W }. A path p can be denoted
as p : I → V where index set I = (1, . . . , m)
Paths of packets in computer networks.
Paths on transportation networks
modelled as graphs.
Figure: A synthetic
ensemble of paths on
a graph.
19 / 25
20. Data Depth Formulation for Paths
Figure: Illustration of band formed by 3 paths.
Path band formulation:
p ∈ B[Pj ] iff p(l) ∈ H[p1(l), . . . , pj (l)] ∀l ∈ I (7)
Path band depth formulation:
pBDj (p) = E [χ(p ∈ B(pj ))] (8)
20 / 25
21. Visualization of Data Depth for Paths
(a) (b)
Figure: Path boxplots for paths on AS and road graphs.
21 / 25
22. Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
22 / 25
23. Relaxed formulations
1 Modified Band Depth - Instead of an indicator function,
measure object inside the band.
2 Subsets - Indicator function with a relaxed threshold.
23 / 25
24. Advantages and Limitations
For Combinatorial Data Depth Formulations for Complex Data
Advantages
No assumption required for the underlying distribution.
Captures nonlocal relationships
Robust.
Limitations
Computationally expensive for large ensembles.
24 / 25