The document discusses the challenge of dealing with the ever-growing amount of data, especially data from social media interactions. It describes this issue as the "inflation epoch" where content is rapidly multiplied through sharing and interactions. It then summarizes four common methodologies used to make this large amount of social data actionable: 1) using indexes and search to structure and retrieve data, 2) extracting features from content to add context, 3) visualizing data to find patterns, and 4) using crowdsourcing and curation to filter information. The conclusion reiterates that the amount of content is growing exponentially so methods to manage it must scale equally.
Democratizing Data to transform gov., business & daily lifeW. David Stephenson
A speech to the Tableau Customer Conference 2009 based on the author's forthcoming "Democratizing Data" book, arguing that a combination of real-time structured data feeds and tools such as the Tableau visualization software can empower entire workforces, cut operating costs, encourage coooperation, and foster crowdsourcing.
Philosophy of Big Data: Big Data, the Individual, and SocietyMelanie Swan
Philosophical concepts elucidate the impact the Big Data Era (exabytes/year of scientific, governmental, corporate, personal data being created) is having on our sense of ourselves as individuals in society as information generators in constant dialogue with the pervasive information climate.
This summarizes my concept of a transformation in which data is only entered once (by government, businesses or the public), automatically tagged with metadata, and then flows, preferably on a real-time basis, to anyone who needs it (limited only by their roles), plus tools to use and interpret the data. The results will be new goods & services, transparency, and economical operations!
Philosophy of Biological Cell Repair informs Geoethical Nanotechnology: Cellular repair is an age-old function in biology. This talk examines the cellular process of repair in philosophical terms. Biologically, wound-healing is the primary form of cellular repair, drawing on numerous cell types and the extracellular matrix to perform a variety of operations during the phases of inflammation, proliferation, and maturation. Philosophically, these functions can be discussed from a systems theory perspective, through the concepts pairs of parts-whole, autonomy-dependency, self-other, sickness-wellness, and scarcity-abundance. Understanding cellular repair at the theory level could facilitate the development of nanotechnology solutions that augment biological processes in ways that are congruently geoethical with nature’s ethos.
Validation of Dunbar's number in Twitter conversationsaugustodefranco .
Bruno Goncalves1;2, Nicola Perra1;3, and Alessandro Vespignani1;2;4
1Center for Complex Networks and Systems Research,
School of Informatics and Computing, Indiana University, IN 47408, USA
2Pervasive Technology Institute, Indiana University, IN 47404, USA
3Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and
4Institute for Scientic Interchange, Turin 10133, Italy
Maio 2011
Democratizing Data to transform gov., business & daily lifeW. David Stephenson
A speech to the Tableau Customer Conference 2009 based on the author's forthcoming "Democratizing Data" book, arguing that a combination of real-time structured data feeds and tools such as the Tableau visualization software can empower entire workforces, cut operating costs, encourage coooperation, and foster crowdsourcing.
Philosophy of Big Data: Big Data, the Individual, and SocietyMelanie Swan
Philosophical concepts elucidate the impact the Big Data Era (exabytes/year of scientific, governmental, corporate, personal data being created) is having on our sense of ourselves as individuals in society as information generators in constant dialogue with the pervasive information climate.
This summarizes my concept of a transformation in which data is only entered once (by government, businesses or the public), automatically tagged with metadata, and then flows, preferably on a real-time basis, to anyone who needs it (limited only by their roles), plus tools to use and interpret the data. The results will be new goods & services, transparency, and economical operations!
Philosophy of Biological Cell Repair informs Geoethical Nanotechnology: Cellular repair is an age-old function in biology. This talk examines the cellular process of repair in philosophical terms. Biologically, wound-healing is the primary form of cellular repair, drawing on numerous cell types and the extracellular matrix to perform a variety of operations during the phases of inflammation, proliferation, and maturation. Philosophically, these functions can be discussed from a systems theory perspective, through the concepts pairs of parts-whole, autonomy-dependency, self-other, sickness-wellness, and scarcity-abundance. Understanding cellular repair at the theory level could facilitate the development of nanotechnology solutions that augment biological processes in ways that are congruently geoethical with nature’s ethos.
Validation of Dunbar's number in Twitter conversationsaugustodefranco .
Bruno Goncalves1;2, Nicola Perra1;3, and Alessandro Vespignani1;2;4
1Center for Complex Networks and Systems Research,
School of Informatics and Computing, Indiana University, IN 47408, USA
2Pervasive Technology Institute, Indiana University, IN 47404, USA
3Linkalab, Complex Systems Computational Lab. - 09100 Cagliari Italy and
4Institute for Scientic Interchange, Turin 10133, Italy
Maio 2011
A new philosophy of economics is needed that is adequate to the contemporary moment, configuring a mindset shift from 1) survival to fulfillment, 2) scarcity to abundance, and 3) centralization to decentralization
Successful societies recognize that economics is shifting to the greater production and consumption of “social goods” in complement to “material goods”
Social goods such as trust, dignity, abundance, opportunity, creative expression, fulfillment, challenge, collaboration, status, certainty, availability, contingency, willingness, cognitive surplus
Claim: societies with less income inequality have greater cohesion and trust, and are better poised to move more quickly into the abundance economics of the future
Blockchain technology is a key mechanism for building new forms of societal shared trust
Modern society arrived with trust beyond kinship groups; similar expansion now beyond hierarchical models with decentralization
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
Amit Sheth, "Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness", Keynote at
From E-Gov to Connected Governance: the Role of Cloud Computing, Web 2.0 and Web 3.0 Semantic Technologies, Falls Church, VA, February 17, 2009. http://semanticommunity.wik.is/
This talk provides a speculative contemplation of philosophical topics that might arise with brain-machine interface technology and explores the new ways that individuals and society might self-enact as a result. Brain-machine interfaces that could be pervasive, continuous, and widely-adopted suggest interesting new possibilities for our future selves. From a philosophical perspective, these possibilities concern the definition of what it is to be human, our current existence and interaction with reality, and how all of this could be dramatically different in a scenario of digitally-linked cloudmind collaborations. This talk looks at some of the foundational ontological questions of how the progression of the existence of the classic human might evolve. Perhaps the most pressing question that currently-minded potential adopters have is how to avoid getting irreparably pulled into a groupmind. To protect against this, there could be an expansion and letting go of the term and concepts of personal identity, and humans as a unit of organization, in favor of instead self-relying on a decentralized permissioning structure like blockchain technology for managing empowered and resilient crowdmind participations.
Describing the design of PowerMeeting (a Web browser based real time groupware) and comparing it with Google Wave and ThinkTank, using Tim Burners Lee's Web science process as a framework.
Towards the Design of Intelligible Object-based Applications for the Web of T...Pierrick Thébault
Presentation given at the second international workshop on the Web of Things (in conjunction with the ninth international conference on pervasive computing, san francisco, usa, june 2011).
More details on http://www.wothings.com.
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
Strategic scenarios in digital content and digital businessMarco Brambilla
This lesson was given in May 2009 at MIP, Politecnico di Milano. The audience included members of the Acer academy program.
Rights on reused content are maintained by respective owners.
See further information on my activity at:
http://home.dei.polimi.it/mbrambil/
and:
http://twitter.com/marcobrambi
Abstract— Relationships are there between objects in Wikipedia. Emphases on determining relationships are there between pairs of objects in Wikipedia whose pages can be regarded as separate objects. Two classes of relationships between two objects exist there in Wikipedia, an explicit relationship is illustrated by a single link between the two pages for the objects, and the other implicit relationship is illustrated by a link structure containing the two pages. Some of the before proposed techniques for determining relationships are cohesion-based techniques, and this technique which underestimate objects containing higher degree values and also such objects could be significant in constituting relationships in Wikipedia. The other techniques are inadequate for determining implicit relationships because they use only one or two of the following three important factor such as the distance, the connectivity and the cocitation.
A new philosophy of economics is needed that is adequate to the contemporary moment, configuring a mindset shift from 1) survival to fulfillment, 2) scarcity to abundance, and 3) centralization to decentralization
Successful societies recognize that economics is shifting to the greater production and consumption of “social goods” in complement to “material goods”
Social goods such as trust, dignity, abundance, opportunity, creative expression, fulfillment, challenge, collaboration, status, certainty, availability, contingency, willingness, cognitive surplus
Claim: societies with less income inequality have greater cohesion and trust, and are better poised to move more quickly into the abundance economics of the future
Blockchain technology is a key mechanism for building new forms of societal shared trust
Modern society arrived with trust beyond kinship groups; similar expansion now beyond hierarchical models with decentralization
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
Amit Sheth, "Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness", Keynote at
From E-Gov to Connected Governance: the Role of Cloud Computing, Web 2.0 and Web 3.0 Semantic Technologies, Falls Church, VA, February 17, 2009. http://semanticommunity.wik.is/
This talk provides a speculative contemplation of philosophical topics that might arise with brain-machine interface technology and explores the new ways that individuals and society might self-enact as a result. Brain-machine interfaces that could be pervasive, continuous, and widely-adopted suggest interesting new possibilities for our future selves. From a philosophical perspective, these possibilities concern the definition of what it is to be human, our current existence and interaction with reality, and how all of this could be dramatically different in a scenario of digitally-linked cloudmind collaborations. This talk looks at some of the foundational ontological questions of how the progression of the existence of the classic human might evolve. Perhaps the most pressing question that currently-minded potential adopters have is how to avoid getting irreparably pulled into a groupmind. To protect against this, there could be an expansion and letting go of the term and concepts of personal identity, and humans as a unit of organization, in favor of instead self-relying on a decentralized permissioning structure like blockchain technology for managing empowered and resilient crowdmind participations.
Describing the design of PowerMeeting (a Web browser based real time groupware) and comparing it with Google Wave and ThinkTank, using Tim Burners Lee's Web science process as a framework.
Towards the Design of Intelligible Object-based Applications for the Web of T...Pierrick Thébault
Presentation given at the second international workshop on the Web of Things (in conjunction with the ninth international conference on pervasive computing, san francisco, usa, june 2011).
More details on http://www.wothings.com.
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
Strategic scenarios in digital content and digital businessMarco Brambilla
This lesson was given in May 2009 at MIP, Politecnico di Milano. The audience included members of the Acer academy program.
Rights on reused content are maintained by respective owners.
See further information on my activity at:
http://home.dei.polimi.it/mbrambil/
and:
http://twitter.com/marcobrambi
Abstract— Relationships are there between objects in Wikipedia. Emphases on determining relationships are there between pairs of objects in Wikipedia whose pages can be regarded as separate objects. Two classes of relationships between two objects exist there in Wikipedia, an explicit relationship is illustrated by a single link between the two pages for the objects, and the other implicit relationship is illustrated by a link structure containing the two pages. Some of the before proposed techniques for determining relationships are cohesion-based techniques, and this technique which underestimate objects containing higher degree values and also such objects could be significant in constituting relationships in Wikipedia. The other techniques are inadequate for determining implicit relationships because they use only one or two of the following three important factor such as the distance, the connectivity and the cocitation.
Digital Identity is fundamental to collaboration in bioinformatics research and development because it enables attribution, contribution, publication to be recorded and quantified.
However, current models of identity are often obsolete and have problems capturing both small contributions "microattribution" and large contributions "mega-attribution" in Science. Without adequate identity mechanisms, the incentive for collaboration can be reduced, and the utility of collaborative social tools hindered.
Using examples of metabolic pathway analysis with the taverna workbench and myexperiment.org, this talk will illustrate problems and solutions to identifying scientists accurately and effectively in collaborative bioinformatics networks on the Web.
This presenations provides an outlook of what we anticipate with the structured data hub: to create linkable datasets, enhance the use of provenance, add quality flags to data, answer new questions and finally, borrow from and provide to public sources such as dbpedia
Film + TV Private Finance Insights 2022 [Report]Jon Gosier
This report shares insights about how the Film/TV industry is evolving from a financial data perspective.
Insights were derived from hundreds of productions prior to release. This is the opposite of box office data, it's the financial data from Film/TV productions that prior to completion and release.
To respect the privacy of the individual filmmakers and producers, we have abstracted the names of each production analyzed.
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M MoviesJon Gosier
Jon Gosier's slides from a presentation given at Scaletech 2022 in Toronto, Canada on 10/13/2022.
Prior to FilmHedge, a fin-tech that has changed the way major Film and TV projects are financed, Jon founded programmatic advertising company Audigent which now dominates the 1st-party entertainment space.
Using 6 films (and one song) as an example, here are my tips for scaling from startup to growth-stage:
1. Zack and Miri - Vet your co-founder based on you mutual interest and obsessiveness at solving a problem
2. Suicide Squad - hire to compliment your weaknesses as a Founder
3. The Expendables - know when to cut your losses with a bad hire or someone in the wrong role
4. Moneyball - track everything, define KPIs (not just for the big obvious stuff), milestones are just KPIs over time, KPIs keep your team focused
5. The French Dispatch - your company's relationship with the press will change as you grow, use the press to help re-educate the market and change behavior
6. "Know When to Hold em/Fold em" - know when to get out of your own way as a Founder, to build something bigger than yourself (by definition) sometimes you don't even need to be there anymore (graduate as a founder).
7. Avengers: Endgame - know your 'endgame'. re you building for lifestyle, legacy, or a liquidity event/exit?
My talk from Marketing Analytics and Data Science 2018 on April 11. A talk illustrating how audience/fan/user meta data is being used by executives in music and sports.
Do recording artists pick-up on emerging audience trends or do audiences follow artist trends? The answer is a bit of both. In this talk data scientist Jonathan Gosier shares methods on how the music industry is learning to apply data science to music listening audiences to boost revenue, maximize reach and improve engagement.
This presentation is about how to design technologies and business models that yield better incomes for society, reduce risk, and, as a result, maximize value.
It was delivered by Jon Gosier on June 26th, 2015 at Maine Startup and Create Week 2015 in Portland, ME.
IOT4I Internet of Things for Impact | NetherlandsJon Gosier
A presentation delivered by data scientist Jon Gosier at the DOEN Foundation in the Netherlands on how IOT can be used at the intersection of people, profit, and planet.
For more visit Gosier.org
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Pa...Jon Gosier
This paper outlines a study conducted in Mombasa, Kenya where real-time consumer data collection techniques (also known as known as big data, real-time data, crowdsourced data or open source data) were used to prove or disprove hypothesis about macroeconomic trends. It concludes that there are many reasons to feel confident that these techniques may serve as sufficient alternatives for economic forecasts in countries where traditional means of microeconomic data collection are sparse due to poor infrastructure and other circumstance. Further research is needed to verify the repeatability of these findings and the methods soundness statistically.
A summary presentation can be found here - http://www.slideshare.net/jongos1/predicting-macrodeck
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...Jon Gosier
This deck outlines a study conducted in Mombasa, Kenya where real-time consumer data collection techniques (also known as known as big data, real-time data, crowdsourced data or open source data) were used to investigate hypotheses about macroeconomic trends. It concludes that there are many reasons to feel confident that these techniques may serve as sufficient alternatives for economic forecasts in countries where traditional means of microeconomic data collection are sparse due to poor infrastructure and other circumstance. Further research is needed to verify the repeatability of these findings and the methods soundness statistically.
The full paper can be found here - http://www.slideshare.net/jongos1/predicting-macroeconomic-trends-through-realtime-mobile-data-collection
21st Century Strategies for Financial InclusionJon Gosier
The wealth of black american households was decimated in 2008. This white paper outlines a strategy on how to structure new instruments for investment for black americans and other minority communities.
About Ebola Deeply - Slides from Foreign Press CenterJon Gosier
Slides from the Interview with Jon Gosier (Ebola Deeply) and Nicole Walden (International Rescue Committee) held November 7th, 2014 by the U.S. State Department. The full transcript can be found at - http://fpc.state.gov/233836.htm
Ebola Deeply is an impact journalism project founded by Isha Sesay (CNN), Jon Gosier (Appfrica/D8A), Lara Setrakian (News Deeply), James Andrews (True Story), Azeo Fables (News Deeply), and Bahiyah Robinson (Appfrica/D8A).
In this press conference we also unveiled our Mobile Wire service, a solution that greatly simplifies and improves targeting fragmented mobile user audiences in developing countries.
Using Predictive Analytics for Anticipatory Investigation and InterventionJon Gosier
The proliferation and adoption of data, sensors, mobile phones and social media technology present new ways of capturing conversations surrounding events in real-time. There is high demand for products that allow law enforcement and criminal investigators and others to explore events by monitoring many transmedia sources (social media sources like Facebook and Twitter, photos, news sources, and tweets) and relating that activity to historic data sets like neighborhood maps, crime databases and other digital records.
!
Using a combination of the data-analysis products available from D8A Group, we’ve been monitoring unfolding events in real-time to illustrate the ways our technology platforms can be used by public safety officials to make data informed decisions in real-time public safety.
Data Mining Online Audiences with D8A GroupJon Gosier
Using a combination of the data-analysis products available from D8A Group, we’ve been monitoring the unfolding events in real-time to illustrate the ways our technology platforms can be used by companies, PR firms, marketing agencies, political groups, celebrities, and NGOs to make data informed decisions in real-time crisis scenarios.
In this case study document, we analyze breaking news scenarios involving Chris Christie's Brigegate scandal, Kerry Washington's appearance at the Golden Globes, and the Knight Foundation who we weren't aware had any news events at the time, but we quickly became aware of two through the use of our software.
The primary purpose of using technologies like the D8A suite of analytic products is to monitor and capture real-time data for analysis and research. They are also predictive, helping to surface trends, patterns, and happenings before one might find out about them otherwise. D8A’s products work across multiple communication channels.
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...Jon Gosier
The proliferation and adoption of mobile phones and social media technologies presents new ways of capturing conversations surrounding crisis events in real-time. This allows researchers, analysts, and first-responders to explore events by monitoring many media sources (blogs, photos, web feeds, news sources, and tweets) from one environment.
The tragic situation unfolding in South Sudan is complex and evolving rapidly. The rate at which the fledgling state has descended into political and social unrest is distressing and highlights the need for urgent intervention. Thus, having ways to identify and engage influencers and to anticipate and potentially mitigate disastrous scenarios is greatly needed.
Using a combination of the data-analysis products available from D8A Group, we’ve been monitoring the unfolding events in real-time to illustrate ways our technology platforms can be used by NGOs, first-responders, civil society organizations and government agencies make data informed decisions in real-time in crisis scenarios.
The Humanitarian Face of Big Data | ICCM 2013Jon Gosier
What's the big deal with Big Data? How does it affect the humanitarian community? Is it all hype? What are the benefits and potential applications? This talk explores the 'humanitarian face of big data'.
CREDITS
========
Smolan, R. (2012) Human Face Of Big Data. Against All Odds Publications
Balet, C. (2013) Strangers in the Light. Gerhard Steidl Druckerei und Verlag
Photos by Josh Grow, Catherine Balet, & Peter Menzel
Cited Projects: D8A.com, AidData.org, un.data.gov, Thomson Reuters, appfrica.com, MetaLayer.com, statfrica.com, SiftDeck.com
One of the most intimidating things about being an entrepreneur or founder of an organization can be the first steps of incorporating and legally structuring things.
The good thing to note is that if you make a mistake at the beginning, its almost always addressable down the road (but it can require a lot of time or significant cost).
What's the best legal structure for your social enterprise? Why do these different models exist?
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
The New Challenge of Data Inflation
1. IQT QUARTERLY
ThE NEW chALLENgE OF DATA INFLATION
By Jon Gosier
The online chatter of individuals, networks of friends or professionals, and the data being
created by networked devices are growing exponentially, although our time to consume
it all remains depressingly finite. There is a plethora of solutions that approach the
challenge of data inflation in different ways. But what methodologies have worked at
scale and how are they being used in the field?
The Infinite Age entirely? Is the content the original blog post, or is it
that post coupled with the likes, retweets, comments,
First, let’s look at the landscape. The amount of and so on?
content being created daily is incredible, and
includes the growth of image data, videos, networked Content is now born and almost instantaneously
devices, and the contextual metadata created around multiplies, not just through copying, but also through
these components. the interactions individuals have with it — making it
their own and subsequently augmenting it.
The Information Age is behind us; we’re now living in
an Infinite Age, where the flow of data is ever-growing In the Infinite Age, one person may have created the
and ever-changing. The life of this content may be content, but because others consumed it, it’s as much
ephemeral (social media) or mostly constant (static a representation of the reader as it is the author.
documents and files). In either case, the challenge is
dealing with it at human scale. The Inflation Epoch
In the time it takes to finish reading a given blog post, In physics, the moment just after the Big Bang, but
chances are high that it’s been retweeted, cited, prior to the early formation of the universe as we
aggregated, pushed, commented on, or ‘liked’ all over understand it, is called the Planck Epoch or the GUT
the web. An all-encompassing verb to describe this is Era (for the Grand Unified Theory that explains how all
sharing. All of this sharing confuses our ability the four forces of nature were once unified). The
to define the essence of that original content because Planck Epoch is defined as the moment of accelerated
value has been added at different points by people growth when the universe expanded from a singularity
who’ve consumed it. smaller than a proton.
Does that additional activity become a part of the We mirror this moment a trillion times each day when
original content, or does it exist as something else online content is created and augmented by the
06 Vol. 3 No. 3 Identify. Adapt. Deliver. ™
2. IQT QUARTERLY
surrounding activity from others online. From the have massive data centers that enable these indexes
moment we publish, the dogpile of interaction that and queries to take seconds.
follows is a bit like the entropy that followed the Big
As big data becomes a growing problem inside of
Bang: exponential and rapid before gradually trailing off
organizations as much as outside, index and search
to a semi-static state where people may still interact,
technology has become one way to deal with data. A
but usually with less frequency, before leaving to
new swath of technologies allow organizations to
consume new content being published elsewhere.
accomplish this without the same level of
When people talk about big data, this is one of the infrastructure because, understandably, not every
problems they are discussing. The amount of content company can afford to spend like Google does on its
that is created daily can be measured in the billions data challenges. Companies like Vertica, Cloudera,
(articles, blog posts, tweets, text messages, photos, 10Gen, and others provide database technology that
etc.), but that’s only where the problem starts. There’s can be deployed internally or across cloud servers
also the activity that surrounds each of those items (think Amazon Web Services), which makes dealing
— comment threads, photo memes, likes on Facebook, with inflated content easier by structuring at the
and retweets on Twitter. Outside of this social database level so that retrieving information takes
communication, there are other problems that must fewer computing resources.
be solved, but for the sake of this article we’ll limit our
This approach allows organizations to capture
focus to big data as it relates to making social media
enormous quantities of data in a database so that it
communication actionable.
can be retrieved and made actionable later.
How are people dealing with social data at this scale to
make it actionable without becoming overwhelmed?
Methodology 2 – contextualization and
Methodology 1 – Index and Search Feature Extraction
Through the development of search technologies,
Search technology as a method for sorting through
incredibly large data sets is the one that most people the phrase “feature extraction” became common
understand because they use it regularly in the form of terminology in information retrieval circles. Feature
Google. Keeping track of all the URLs and links that extraction uses algorithms to pull out the individual
make up the web is a seemingly Sisyphean task, but nuances of content. This is done at what I call an
Google goes a step beyond that by crawling and atomic level, meaning any characteristic of data that
indexing large portions of public content online. This can be quantified. For instance, in an email, the TO:
helps Google perform searches much faster than and FROM: addresses would be features of that email.
having to crawl the entire web every time a search is The timestamp indicating when that email was sent is
executed. Instead, connections are formed at the also a feature. The subject would be another. Within
database level, allowing queries to occur faster while each of those there are more features as well, but this
using less server resources. Companies like Google is the general high-level concept.
Waveform 1 = Value
for all Categories
over time
Waveform 2 = Value
for Sub-Category
over time
Waveform 3 = Value
for disparate Category
over time
Stacked Waveform graph used to plot data with values in a positive and negative domain (like sentiment analysis).
Produced by the author using metaLayer.
IQT QUARTERLY WINTER 2012 Vol. 3 No. 3 07
3. IQT QUARTERLY
For the most part, search uses such features to help
improve the efficiency of the index. In contextualization
technologies, the practice is to use these features to
modify the initial content, adding it as metadata and
thereby creating a type of inflated content (as we’ve
augmented the original).
When users drag photos onto a map on Flickr, they are
contextualizing them by tagging the files with location
data. This location data makes the inflated content
actionable; we can view it on a map, or we can segment
photos by where they were taken. This new location
data is an augmentation of the original metadata,
and creates something that previously did not exist.
When we’re dealing in the hundreds of thousands,
millions, and billions of content items, feature extraction
is used to carry out the previous example at scale.
The following is a real-world use case, though I’ve
Infographic showing network analysis of the SwiftRiver
been careful not to divulge any of the client’s platform. Produced by the author using Circos.
proprietary details.
Recently at my company metaLayer, a colleague
came to us with one terabyte of messages pulled Methodology 3 – Visualization
from Twitter. These messages had been posted Visualization is another way to deal with excessive data
during Hurricane Irene. The client needed to generate problems. Visualization may sound like an abstract
conclusions about these messages that were not term, but the visual domain is actually one of the most
going to be possible in their original loosely structured basic things humans can use for relating complex
form. He asked us to help him structure this data, concepts to one another. We’ve used symbols for
find the items he was looking for (messages about thousands of years to do this. Data visualizations and
Hurricane Irene or people affected by it), and extract infographics simply use symbols to cross barriers of
the features that would be useful for him. understanding about the complexities of research so
The client lacked the sufficient context to identify what that, regardless of expertise, everyone involved has a
was most relevant in the data set. So we used our better understanding of a problem.
platform to algorithmically extract features like Content producers like the New York Times have found
sentiment, location, and taxonomic values from each that visualizations are a great way to increase
Twitter message using natural language processing. audience engagement with news content. The
Because it was an algorithmic process, this only took explosion of interest in infographics and data
a few hours, allowing the client to get a baseline that visualizations online echoes this.
made the rest of his research possible. Now the data
could be visualized or segmented in ways that weren’t To visualize excessive data sets, leveraging some of the
possible with the initial content. The inflated content previous methods makes discovering hidden patterns
— metadata generated algorithmically — included and trends a visual, and likely more intuitive, process.
elements that made his research possible. We now
had an individual profile of every message that gave Methodology 4 – crowd Filtering
us a clue about its tone, location of origin, and how to and curation
categorize the message. This allowed the client's team
The rise of crowdsourcing methodologies presents a
to look at the data with a new level of confidence.
new framework for dealing with certain types of
In the context of our social data challenge, these information overload. Websites like Digg and Reddit
extracted features might be used on their own by an are examples of using crowdsourcing to vet and
application, or they might become part of an index or prioritize data by the cumulative interests of a given
database like the one mentioned previously. community. On these websites, anyone can contribute
information, but only the material deemed to be
08 Vol. 3 No. 3 Identify. Adapt. Deliver. ™
4. IQT QUARTERLY
Big data can refer to managing
an excess of data, to an overwhelming
feed of data, or to the rapid proliferation
of inflated content due to the meta-
values added through sharing.
interesting by the highest number of people will rise Curation has risen as a term over the past decade to
to the top. By leveraging the community of users and refer to the collection, intermixing, and re-syndication
consumers itself as a resource, the admin passes the of inflated content. It is used to refer to the
responsibility of finding the most relevant content to construction of narratives without actually producing
the users. any original content, instead taking relevant bits of
content created by others and using them as the
This, of course, won’t work in quite the same way for
building blocks for something new. This presentation
an organization’s internal project or your data mining
of public data as edited by others represents a new
project, but it does work if you want to limit the
work, though this “new work” may be made up of
information collected to those in the crowd who have
nothing original at all. By carefully selecting curators
some measure of authority or authenticity.
whom you know will be selective in what they curate,
The news curation service Storyful.com is a great the aggregate of information produced should be of a
example of using crowd-sourced information to report higher quality.
on breaking events around the globe. Its system works
conclusion
not because the masses are telling the stories (that
would lead to unmanageable chaos), but because the Big data, like most tech catchphrases, means different
staff behind Storyful has pre-vetted contributors. This things to different people. It can refer to managing an
is known as bounded crowdsourcing, which simply excess of data, to an overwhelming feed of data, or to
means to extend some measure of authority or the rapid proliferation of inflated content due to the
preference to a subset of a larger group. In this case, meta-values added through sharing. Pulling
the larger group is anyone using social media around actionable information from streams of social
the world, whereas the bounded group is only those communication represents a unique challenge in that
around the world that Storyful’s staff has deemed to it embodies all aspects of the phrase and the
be consistent in their reliability, authority, and accompanying challenges. Ultimately, if content is
authenticity. This is commonly referred to as curating growing exponentially, the methods to manage it have
the curators. to be capable of equal speed and scale.
Jon Gosier is the co-founder of metaLayer Inc., which offers products for visualization, analytics, and the structure of
indiscriminate data sets. metaLayer gives companies access to complex visualization and algorithmic tools, making
intelligence solutions intuitive and more affordable. Jon is a former staff member at Ushahidi, an organization that
provides open source data products for global disaster response and has worked on signal to noise problems with
international journalists and defense organizations as Director of its SwiftRiver team.
IQT QUARTERLY WINTER 2012 Vol. 3 No. 3 09