1) The document discusses univariate distribution relationships and provides code examples to generate and plot Bernoulli, binomial, and normal distributions from random variable samples.
2) The code generates random variable samples from Bernoulli, binomial, and normal distributions with varying parameter values and plots the empirical distributions alongside the theoretical distributions.
3) Confidence intervals for the normal distribution are also calculated and printed based on the sample size, probability, and theoretical normal distribution parameters.
Big Social Data: The Spatial Turn in Big Data (Video available soon on YouTube)Rich Heimann
Big Social Data: The Spatial Turn in Big Data
By Richard Heimann & Abe Usher
University of Maryland Baltimore County Webinar Description:
The increased access to spatial data and overall improved application of spatial analytical methods present certain potential to social scientific research. This webinar is designed to focus on substantive social science research perspectives while exposing rewards involved in the application of geographic information systems (GIS), Big Data, and spatial analytics in their own research.
What is witnessed as the hype of Web 2.0 has worn off and the collaborative use of the Internet becomes a societal norm is an unprecedented explosion in the creation and analysis of geospatial data. Just as major governments are reducing their investments in location intelligence, individuals and non-government organizations are fueling a bonfire of innovation in the world of GIS data.
Traditional spatial analyses grew up in an era of sparse data and very weak computational power. Today, both of those circumstances are reversed and many of the old solutions are no longer suitable to answer todays questions.
"Big Social Data: The Spatial Turn in Big Data" reflects this change and combines two things which, until recently, engaged quite different groups of researchers and practitioners. Together, they require particular techniques and a sophisticated understanding of the special problems associated with spatial social data. Geographic Data Mining, or Geographic Knowledge Discovery, is not new, but is developing and changing rapidly as both more, and different, data becomes available, and people see new applications. The days of ‘Big Data’ require fresh thinking.
The webinar will highlight connections between spatial concepts and data availability. New emerging social media data will be promoted over traditional social science data, which better reflect some of the more recently developments in Big Data - most notably the socially critical exploration of such data.
Estimators for structural equation models of Likert scale dataNick Stauner
Which estimation method is optimal for structural equation modeling (SEM) of Likert scale data? Conventional SEM assumes continuous measurement, and some SEM estimators assume a multivariate normal distribution, but Likert scale data are ordinal and do not necessarily resemble a discretized normal distribution. When treated as continuous, these data may yet be skewed due to item difficulty, choice of population, or various response biases. One can fit an SEM to a matrix of polychoric correlations, which estimate latent, continuous constructs underlying ordinally measured variables, but polychoric correlations also assume these latent factors are normally distributed. To what extent are these methods robust with continuous versus ordinal data and with varying degrees of skewness and kurtosis? To answer, I simulated 10,000 samples of multivariate normal data, each consisting of 500 observations of five strongly correlated variables. I transformed each consecutive sample to an incrementally greater degree to increase skew and kurtosis from approximately normal levels to extremes beyond six and 30, respectively. I then performed five confirmatory factor analyses on each sample using five different estimators: maximum likelihood (ML), weighted least squares (WLS), diagonally weighted least squares (DWLS), unweighted least squares (ULS), and generalized least squares (GLS). I compared results for continuous and discretized (ordinal) data, including loadings, error variances, fit statistics, and standard errors. I also noted frequencies of failures, which complicated calculation of polychoric correlations, and particularly plagued the WLS estimator. WLS estimation produced relatively biased loadings and error variance estimates. GLS also underestimated error variances. Neither estimator exhibited any unique advantage to offset these disadvantages. ML estimated parameters more accurately, but some fit statistics appeared biased by it, especially in the context of extreme nonnormality. Specifically, the chi squared goodness-of-fit test statistic and the root mean square error of approximation (RMSEA) began higher with ML-estimated SEMs of approximately normal data, and worsened sharply with greater nonnormality. The Tucker Lewis Index (TLI) and standardized root mean square residual (SRMR) also worsened more moderately with nonnormality when using ML estimation. GLS-estimated fit statistics shared ML’s sensitivity to nonnormality, and were even worse for the TLI and SRMR. Results generally favored ULS and DWLS estimators, which produced accurate parameter estimates, good and robust fit statistics, and small standard errors (SEs) for loadings. DWLS tended to produce smaller SEs than ULS when skewness was below three, but ULS SEs were more robust to nonnormality and smaller with extremely nonnormal data. ML SEs were larger for loadings, but smaller for error variance estimates, and fairly robust to nonnormality...
Big Social Data: The Spatial Turn in Big Data (Video available soon on YouTube)Rich Heimann
Big Social Data: The Spatial Turn in Big Data
By Richard Heimann & Abe Usher
University of Maryland Baltimore County Webinar Description:
The increased access to spatial data and overall improved application of spatial analytical methods present certain potential to social scientific research. This webinar is designed to focus on substantive social science research perspectives while exposing rewards involved in the application of geographic information systems (GIS), Big Data, and spatial analytics in their own research.
What is witnessed as the hype of Web 2.0 has worn off and the collaborative use of the Internet becomes a societal norm is an unprecedented explosion in the creation and analysis of geospatial data. Just as major governments are reducing their investments in location intelligence, individuals and non-government organizations are fueling a bonfire of innovation in the world of GIS data.
Traditional spatial analyses grew up in an era of sparse data and very weak computational power. Today, both of those circumstances are reversed and many of the old solutions are no longer suitable to answer todays questions.
"Big Social Data: The Spatial Turn in Big Data" reflects this change and combines two things which, until recently, engaged quite different groups of researchers and practitioners. Together, they require particular techniques and a sophisticated understanding of the special problems associated with spatial social data. Geographic Data Mining, or Geographic Knowledge Discovery, is not new, but is developing and changing rapidly as both more, and different, data becomes available, and people see new applications. The days of ‘Big Data’ require fresh thinking.
The webinar will highlight connections between spatial concepts and data availability. New emerging social media data will be promoted over traditional social science data, which better reflect some of the more recently developments in Big Data - most notably the socially critical exploration of such data.
Estimators for structural equation models of Likert scale dataNick Stauner
Which estimation method is optimal for structural equation modeling (SEM) of Likert scale data? Conventional SEM assumes continuous measurement, and some SEM estimators assume a multivariate normal distribution, but Likert scale data are ordinal and do not necessarily resemble a discretized normal distribution. When treated as continuous, these data may yet be skewed due to item difficulty, choice of population, or various response biases. One can fit an SEM to a matrix of polychoric correlations, which estimate latent, continuous constructs underlying ordinally measured variables, but polychoric correlations also assume these latent factors are normally distributed. To what extent are these methods robust with continuous versus ordinal data and with varying degrees of skewness and kurtosis? To answer, I simulated 10,000 samples of multivariate normal data, each consisting of 500 observations of five strongly correlated variables. I transformed each consecutive sample to an incrementally greater degree to increase skew and kurtosis from approximately normal levels to extremes beyond six and 30, respectively. I then performed five confirmatory factor analyses on each sample using five different estimators: maximum likelihood (ML), weighted least squares (WLS), diagonally weighted least squares (DWLS), unweighted least squares (ULS), and generalized least squares (GLS). I compared results for continuous and discretized (ordinal) data, including loadings, error variances, fit statistics, and standard errors. I also noted frequencies of failures, which complicated calculation of polychoric correlations, and particularly plagued the WLS estimator. WLS estimation produced relatively biased loadings and error variance estimates. GLS also underestimated error variances. Neither estimator exhibited any unique advantage to offset these disadvantages. ML estimated parameters more accurately, but some fit statistics appeared biased by it, especially in the context of extreme nonnormality. Specifically, the chi squared goodness-of-fit test statistic and the root mean square error of approximation (RMSEA) began higher with ML-estimated SEMs of approximately normal data, and worsened sharply with greater nonnormality. The Tucker Lewis Index (TLI) and standardized root mean square residual (SRMR) also worsened more moderately with nonnormality when using ML estimation. GLS-estimated fit statistics shared ML’s sensitivity to nonnormality, and were even worse for the TLI and SRMR. Results generally favored ULS and DWLS estimators, which produced accurate parameter estimates, good and robust fit statistics, and small standard errors (SEs) for loadings. DWLS tended to produce smaller SEs than ULS when skewness was below three, but ULS SEs were more robust to nonnormality and smaller with extremely nonnormal data. ML SEs were larger for loadings, but smaller for error variance estimates, and fairly robust to nonnormality...
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas