Extension of this method exists in recent paper here: https://arxiv.org/ftp/arxiv/papers/1708/1708.05712.pdf
Overview and tutorial of Morse-Smale regression prior to a new paper coming out exploring this idea further. It is a topologically-based piecewise regression method for supervised learning.
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsColleen Farrelly
Short overview of two regression model extensions using differential geometry and homotopy continuation. Case study involves an open-source dataset that can be found on my ResearchGate page, along with the R code used in the analysis. Contains a short reference section for readers interested in learning more about the methods.
Presentation summarizes main content of Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to Actuarial Science. arXiv preprint arXiv:1708.05712.
Paper was accepted December 2017 by Casualty Actuarial Society.
Extending superlearner framework to survival analysis. Includes boosted regression, random forest, decision trees, Bayesian model average, and Morse-Smale regression.
Presents a new type of statistical model developed by Quantopo, LLC, based on generalized linear modeling and Tweedie regression that leverages the power of quantum computing. Paper is being written and will be uploaded to arXiv while under review.
Hierarchical clustering and topology for psychometric validationColleen Farrelly
From my graduate work and extended to the field of education.
Citation of paper from which presentation was derived:
Farrelly, C. M., Schwartz, S. J., Amodeo, A. L., Feaster, D. J., Steinley, D. L., Meca, A., & Picariello, S. (2017). The Analysis of Bridging Constructs with Hierarchical Clustering Methods: An application to identity. Journal of Research in Personality.
Logistic regression: topological and geometric considerationsColleen Farrelly
Discussion and comparison of topologically- and geometrically-based machine learning algorithms on medical datasets with binary outcomes.
Paper submitted and ping-ponging between medical and data mining journals (interesting results, too weird of a fit), so I'm uploading the presentation version :)
Farrelly, C. M. (2017, October 17). Topology and Geometry in Machine Learning for Logistic Regression. Retrieved from psyarxiv.com/v8jgk
Extension of this method exists in recent paper here: https://arxiv.org/ftp/arxiv/papers/1708/1708.05712.pdf
Overview and tutorial of Morse-Smale regression prior to a new paper coming out exploring this idea further. It is a topologically-based piecewise regression method for supervised learning.
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsColleen Farrelly
Short overview of two regression model extensions using differential geometry and homotopy continuation. Case study involves an open-source dataset that can be found on my ResearchGate page, along with the R code used in the analysis. Contains a short reference section for readers interested in learning more about the methods.
Presentation summarizes main content of Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to Actuarial Science. arXiv preprint arXiv:1708.05712.
Paper was accepted December 2017 by Casualty Actuarial Society.
Extending superlearner framework to survival analysis. Includes boosted regression, random forest, decision trees, Bayesian model average, and Morse-Smale regression.
Presents a new type of statistical model developed by Quantopo, LLC, based on generalized linear modeling and Tweedie regression that leverages the power of quantum computing. Paper is being written and will be uploaded to arXiv while under review.
Hierarchical clustering and topology for psychometric validationColleen Farrelly
From my graduate work and extended to the field of education.
Citation of paper from which presentation was derived:
Farrelly, C. M., Schwartz, S. J., Amodeo, A. L., Feaster, D. J., Steinley, D. L., Meca, A., & Picariello, S. (2017). The Analysis of Bridging Constructs with Hierarchical Clustering Methods: An application to identity. Journal of Research in Personality.
Logistic regression: topological and geometric considerationsColleen Farrelly
Discussion and comparison of topologically- and geometrically-based machine learning algorithms on medical datasets with binary outcomes.
Paper submitted and ping-ponging between medical and data mining journals (interesting results, too weird of a fit), so I'm uploading the presentation version :)
Farrelly, C. M. (2017, October 17). Topology and Geometry in Machine Learning for Logistic Regression. Retrieved from psyarxiv.com/v8jgk
Deep vs diverse architectures for classification problemsColleen Farrelly
Deep learning study, comparing deep learning methods with wide learning methods; applications include simulation data and real industry problems. Pre-print of paper found here: https://arxiv.org/ftp/arxiv/papers/1708/1708.06347.pdf
A short tutorial on Morse functions and their use in modern data analysis for beginners. Uses visual examples and analogies to introduce topological concepts and algorithms.
Creates heuristic guidelines for classifying types of networks empirically through a series of network metrics. Introduces metrics and theoretical background of what those network metrics measure with respect to the graph.
Updated Machine Learning by Analogy presentation that builds to more advanced methods (TensorFlow, geometry/topology-based methods...) and adds a section on time series methods.
A start guide to the concepts and algorithms in machine learning, including regression frameworks, ensemble methods, clustering, optimization, and more. Mathematical knowledge is not assumed, and pictures/analogies demonstrate the key concepts behind popular and cutting-edge methods in data analysis.
Updated to include newer algorithms, such as XGBoost, and more geometrically/topologically-based algorithms. Also includes a short overview of time series analysis
Cluster analysis is a data exploration (mining) tool
for dividing a multivariate dataset into “natural”
clusters (groups). We use the methods to explore
whether previously undefined clusters (groups) may
exist in the dataset.
Deep vs diverse architectures for classification problemsColleen Farrelly
Deep learning study, comparing deep learning methods with wide learning methods; applications include simulation data and real industry problems. Pre-print of paper found here: https://arxiv.org/ftp/arxiv/papers/1708/1708.06347.pdf
A short tutorial on Morse functions and their use in modern data analysis for beginners. Uses visual examples and analogies to introduce topological concepts and algorithms.
Creates heuristic guidelines for classifying types of networks empirically through a series of network metrics. Introduces metrics and theoretical background of what those network metrics measure with respect to the graph.
Updated Machine Learning by Analogy presentation that builds to more advanced methods (TensorFlow, geometry/topology-based methods...) and adds a section on time series methods.
A start guide to the concepts and algorithms in machine learning, including regression frameworks, ensemble methods, clustering, optimization, and more. Mathematical knowledge is not assumed, and pictures/analogies demonstrate the key concepts behind popular and cutting-edge methods in data analysis.
Updated to include newer algorithms, such as XGBoost, and more geometrically/topologically-based algorithms. Also includes a short overview of time series analysis
Cluster analysis is a data exploration (mining) tool
for dividing a multivariate dataset into “natural”
clusters (groups). We use the methods to explore
whether previously undefined clusters (groups) may
exist in the dataset.
In the classical model, the fundamental building block is represented by bits exists in two states a 0 or a 1. Computations are done by logic gates on the bits to produce other bits. By increasing the number of bits, the complexity of problem and the time of computation increases. A quantum algorithm is a sequence of operations on a register to transform it into a state which when measured yields the desired result. This paper provides introduction to quantum computation by developing qubit, quantum gate and quantum circuits.
Data Analysis: Statistical Methods: Regression modelling, Multivariate Analysis - Classification: SVM & Kernel Methods - Rule Mining - Cluster Analysis, Types of Data in Cluster Analysis, Partitioning Methods, Hierarchical Methods, Density Based Methods, Grid Based Methods, Model Based Clustering Methods, Clustering High Dimensional Data - Predictive Analytics – Data analysis using R.
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
A brief overview of generative AI technologies and their use for social good initiatives, including cultural training, medical image generation, drug design, and public health.
PyData Global 2023 talk overviewing case studies in network science, including stock market crash prediction, food price pattern mining, and stopping the spread of epidemics.
Overview of mathematical and machine learning models related to climate risk modeling, climate change simulations, and change point detection. Includes a hands-on session with geometry-based systems analysis of food prices related to climate change and geopolitical factors.
WiDS Workshop on natural language processing and generative AI. Details common methods that tie into coding examples. Ends with ethics discussion regarding these technologies and potential for misuse.
Link to talk YouTube: https://www.youtube.com/watch?v=byGzKm0H1-8&list=PLHAk3jHXWpxI7fHw8m5PhrpSRpR3NIjQo&index=3
ODSC-East 2023 presentation covering topics related to my book, The Shape of Data, including how geometry plays a role in text/image embeddings, network science problems, survey data analytics, image analytics, and epidemic wrangling.
This talk overviews my background as a female data scientist, introduces many types of generative AI, discusses potential use cases, highlights the need for representation in generative AI, and showcases a few tools that currently exist.
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
The tools possible to leverage for public health interventions has changed significantly in the past decades. Tools from geometry, natural language processing, and generative AI allow for a quick design and implementation of interventions, even in very rural parts of the world. Case studies involve HIV, Ebola, and COVID interventions.
WoComToQC workshop lecture on Forman-Ricci curvature for applications in industry (social networks, disaster logistics, spatial data, and spatiotemporal goods pricing data).
PyData Global talk covering tools from geometry/topology and their uses in public health, public policy, and social good initiatives. Examples include food price prediction, COVID policies, public health interventions, and fair AI.
Data Science Dojo Talk on comparing time series using persistent homology. Short overview of time series data. A bit of topology. Code available. Example includes stock exchange data.
Statistical and topological algorithm piece of an Applied Machine Learning Days Morocco talk. Covers ARIMA models, SSA models, GEE models, and persistent homology. Applications include pricing data, stock data, development data, and healthcare data. Datasets and full presentation can be found on GitHub: https://github.com/gabayae/Time-Series-Applications_AMLD2022
An introduction to quantum machine learning.pptxColleen Farrelly
Very basic introduction to quantum computing given at Indaba Malawi 2022. Overviews some basic hardware in classical and quantum computing, as well as a few quantum machine learning algorithms in use today. Resources for self-study provided.
Indaba Malawi workshop on basic approaches to time series data, including ARIMA models and SSA models. Example in R includes an agricultural example from historical Malawi data with Rssa package and base ARIMA models.
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
This talk highlights the challenges and opportunities that exist in linguistically underserved areas. It highlights NLP initiatives in Sub-Saharan Africa, as well as financial opportunities in technology if areas neglected linguistically can produce tools in their local languages. Ethics, ownership, and other concerns are highlighted to guide development initiatives.
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
Women in Data Science (Alexandria, Egypt) keynote address. Topics cover my journey into data science/machine learning, an overview of data science as a profession, and some case studies on topology/geometry in analytics. Example case studies include insurance, natural language processing, social network analysis, and psychometrics.
WiDS Alexandria, Egypt workshop in topological data analysis (Python and R code available on request), covering persistent homology, the Mapper algorithm, and discrete Ricci curvature. Examples include text data and social network data.
First part of a workshop looking at industry case studies in natural language processing for From Theory to Practice Workshop (AIMS, Kigali, March 2022).
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
Overview of text data, processing of text data, integration of text data with structured databases, and uses of text data in analytics across a variety of fields. Here's the talk link: https://www.youtube.com/watch?v=wS0X1bSsuUU
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
3. Linear Algebra in Analytics
Data Analysis
Matrix Singular Value Decomposition (SVD)
Factor analysis/latent modeling
Cluster of individuals
Single time, longitudinal
Signal processing
Recommenders and Collaborative Filtering
User-item matrices
Adjacency matrix and PageRank
Mathematical Statistics
Rich theoretical history
Moment statistics
Optimization
Physics and physical chemistry modeling
Left, right, and
diagonal of
matrix’s
singular values
SVD
4. Extensions of Linear Algebra
Vectors and matrices are
common in statistics and
machine learning
1-D and 2-D representations
of data relationships
Many theoretical results
leveraged in algorithms
What about 3-D or 4-D or 100-
D representations of data
relationships?
MRI slice sequence
Multimodal signals
These objects are tensors.
Vector
Matrix
Tensor
5. Tensor Algebra
Rich history
Physics
Gravity
Field/string theory
Fluid mechanics
Multilinear algebra
Grassman algebra
Differential forms
Extends familiar linear algebra tools and
constructs to higher-dimensional spaces
Determinants/traces
Linear mapping (space and basis
transformations of topological space)
Inner products
Building complex topological objects
6. Tensors and the SVD
The SVD has multilinear
extensions.
Rank of tensors an open problem
considered to be NP-hard
Rank approximation algorithms
exist for tensor decomposition
Many nice theoretical results
(bounds, statistical properties…)
Can be used exploited for analytics
User/item/time tensor construction
for recommendation
Latent transition analysis
Multimodal signal integrated analysis
Factor loading matrices
Reduced
tensor
Full tensor
Tensor
Decomposition
7. Tensor Decomposition Methods Graph-Based Composite
Likelihoods
Hidden node conditional
likelihood to estimate
parameters
Latent Tree Graph Models
Latent Tree Graph Model
formulation followed by
iterative, hierarchical
decomposition collapsing
into matrix SVD
TripleRank
PARAFAC followed by HITS
authority score on resulting
graph (PageRank variant)
Latent Schatten Norms
Group LASSO/Tucker
hybrid
General Decomposition
Alternating least squares
Gradient-based
Eigenvalue decomposition
Level 1
Level 2
High Order Singular Value
Decomposition
Full dimensionality control using
extended SVD method
Canonical Decomposition/
PARAFAC (Tucker)
Linear combination with no
orthogonality constraints (least
squares algorithms like HOSVD)
Regularization and truncation
Weaker requirements for
uniqueness
Tensor Unfolding
Unfold tensor along mode and
perform SVD on unfolded tensor
matrix
Non-Negative Tensor Factorization
Independent polynomial
formulation
Follows non-negative matrix
factorization algorithm (least
squares)
10. Image Integration/Analysis
MRI data with many components of images per patient
Tensor decomposition to reduce dimensionality
Noise filtration
Less computationally-intensive data mining/predictive modeling
Control over dimensionality to obtain standard size
components across individuals (integrate with prediction)
Integration/analysis of many types of image data (ex. MRI
+ PET + fMRI)
Tensor decomposition to identify key elements within each
patient’s images
Data mining
Highlighting potentially useful information for clinicians
Factor analysis extension for identifying similar components
across images
Partition images corresponding to anatomy or function
Data mine factors to identify functional areas
11. Extensions to Signal Data
These principles of
integrating image data
extends to other types of
signals:
EEG
EKG
Pulse Oxygenation
Other biometric data
collected over time from
patients
Set up problem as high-
dimensional tensor and
apply algorithms as before