This is a CIDR 2009 presentation. See http://infoblog.stanford.edu/ for more information and http://www-db.cs.wisc.edu/cidr/cidr2009/program.html for downloads.
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...ijaia
This document summarizes a research paper that used machine learning algorithms to analyze social networks on YouTube. The researchers used unsupervised learning techniques like clustering and centrality measures to identify communities and influential users. Specifically, they used Louvain modularity and spectral clustering to detect groups for advertising purposes. Degree centrality and clique centrality were calculated to find central nodes that could be targeted for sponsorship deals. The experiments showed the algorithms could successfully find tightly-knit groups and key influencers within the larger YouTube network.
Methods for Intrinsic Evaluation of Links in the Web of DataCristina Sarasua
The current Web of Data contains a large amount of interlinked data. However, there is still a limited understanding about the quality of the links connecting entities of different and distributed data sets. Our goal is to provide a collection of indicators that help assess existing interlinking. In this paper, we present a framework for the intrinsic evaluation of RDF links, based on core principles of Web data integration and foundations of Information Retrieval. We measure the extent to which links facilitate the discovery of an extended description of entities, and the discovery of other entities in other data sets. We also measure the use of different vocabularies. We analysed links extracted from a set of data sets from the Linked Data Crawl 2014 using these measures.
Is More Better?: Impact of Multiple Photos on Perception of Persona ProfilesJoni Salminen
The document reports on a study that examined how the inclusion of different types of photos in automatically generated online persona profiles impacts people's perceptions of confusion and informativeness. The study found that including contextual photos increased perceived informativeness while including multiple similar attribute photos increased confusion. The results suggest that including a headshot photo and contextual photos of the same person provides the optimal persona profile design.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
This document discusses monitoring and analyzing online communities. It begins by outlining tools for monitoring social media mentions, sentiment, discussion activity and more. It then discusses measuring social media usage in companies and tools for analyzing community features like influence, opinions and geolocation. The document explores merging offline and online social networks using sensors and integrating physical presence data with online profiles and semantic analysis. It provides examples of tracking face-to-face contact networks and analyzing characteristics of offline social networks.
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
This document summarizes a talk on how collective and augmented intelligence can help solve societal problems. It discusses how AI depends on human input, how collective intelligence benefits AI, and provides examples of using human computation and crowdsourcing to support disaster relief and conduct urban auditing. It also describes challenges in making crowdsourcing sustainable and assessing data quality, and emphasizes the need for iterative design of human-AI systems to bring together human, collective, and computational intelligence.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
Large scale social recommender systems and their evaluationMitul Tiwari
This talk will give an overview of some of the large-scale recommender systems at LinkedIn such as People You May Know (PYMK) and Suggested Skills Endorsements. This talk will also address how we formulate machine learning modeling problems to build these recommender systems and evaluate our models. Modeling for these recommender systems involves careful feature engineering and incorporating user feedback - both explicit and implicit. This talk will describe how we feature engineer through an example of modeling organizational overlap between people for link prediction and community detection over social graph. Also, how we incorporate user feedback through impression discounting ignored recommended results will be described. Careful evaluation of modeling changes both offline and online (A/B testing) is inherent part of measuring effectiveness of our recommender systems. We have built a sophisticated end-to-end A/B testing and evaluation platform called XLNT at LinkedIn and this talk will also cover how we use XLNT for power analysis, A/B testing, and measuring confidence of the results.
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...ijaia
This document summarizes a research paper that used machine learning algorithms to analyze social networks on YouTube. The researchers used unsupervised learning techniques like clustering and centrality measures to identify communities and influential users. Specifically, they used Louvain modularity and spectral clustering to detect groups for advertising purposes. Degree centrality and clique centrality were calculated to find central nodes that could be targeted for sponsorship deals. The experiments showed the algorithms could successfully find tightly-knit groups and key influencers within the larger YouTube network.
Methods for Intrinsic Evaluation of Links in the Web of DataCristina Sarasua
The current Web of Data contains a large amount of interlinked data. However, there is still a limited understanding about the quality of the links connecting entities of different and distributed data sets. Our goal is to provide a collection of indicators that help assess existing interlinking. In this paper, we present a framework for the intrinsic evaluation of RDF links, based on core principles of Web data integration and foundations of Information Retrieval. We measure the extent to which links facilitate the discovery of an extended description of entities, and the discovery of other entities in other data sets. We also measure the use of different vocabularies. We analysed links extracted from a set of data sets from the Linked Data Crawl 2014 using these measures.
Is More Better?: Impact of Multiple Photos on Perception of Persona ProfilesJoni Salminen
The document reports on a study that examined how the inclusion of different types of photos in automatically generated online persona profiles impacts people's perceptions of confusion and informativeness. The study found that including contextual photos increased perceived informativeness while including multiple similar attribute photos increased confusion. The results suggest that including a headshot photo and contextual photos of the same person provides the optimal persona profile design.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
This document discusses monitoring and analyzing online communities. It begins by outlining tools for monitoring social media mentions, sentiment, discussion activity and more. It then discusses measuring social media usage in companies and tools for analyzing community features like influence, opinions and geolocation. The document explores merging offline and online social networks using sensors and integrating physical presence data with online profiles and semantic analysis. It provides examples of tracking face-to-face contact networks and analyzing characteristics of offline social networks.
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
This document summarizes a talk on how collective and augmented intelligence can help solve societal problems. It discusses how AI depends on human input, how collective intelligence benefits AI, and provides examples of using human computation and crowdsourcing to support disaster relief and conduct urban auditing. It also describes challenges in making crowdsourcing sustainable and assessing data quality, and emphasizes the need for iterative design of human-AI systems to bring together human, collective, and computational intelligence.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
Large scale social recommender systems and their evaluationMitul Tiwari
This talk will give an overview of some of the large-scale recommender systems at LinkedIn such as People You May Know (PYMK) and Suggested Skills Endorsements. This talk will also address how we formulate machine learning modeling problems to build these recommender systems and evaluate our models. Modeling for these recommender systems involves careful feature engineering and incorporating user feedback - both explicit and implicit. This talk will describe how we feature engineer through an example of modeling organizational overlap between people for link prediction and community detection over social graph. Also, how we incorporate user feedback through impression discounting ignored recommended results will be described. Careful evaluation of modeling changes both offline and online (A/B testing) is inherent part of measuring effectiveness of our recommender systems. We have built a sophisticated end-to-end A/B testing and evaluation platform called XLNT at LinkedIn and this talk will also cover how we use XLNT for power analysis, A/B testing, and measuring confidence of the results.
CIDR (Classless Inter-Domain Routing) es un sistema para asignar y manejar direcciones IP de forma más eficiente. CIDR permite el uso de prefijos de red de tamaño variable en lugar de limitarse a las clases tradicionales, agrupando bloques de direcciones existentes. Esto reduce el tamaño de las tablas de ruteo y jerarquiza las rutas entre dominios para una mejor administración de las direcciones IP disponibles a medida que Internet continúa creciendo.
The document discusses the transition from classful networks to classless inter-domain routing (CIDR) networks. CIDR allows for more flexibility in assigning blocks of IP addresses and improves routing efficiency by allowing routes to be aggregated. Valid CIDR blocks must have the host bits set to zero so the address falls on the network boundary. Large blocks are allocated by regional organizations like RIPE and then assigned to ISPs and other organizations in smaller blocks.
This document discusses route summarization and Classless Interdomain Routing (CIDR). It aims to describe how to implement route summarization, calculate route summarization, and explain CIDR implementation. The key topics covered are summarizing routes within an octet, summarizing addresses in a VLSM network, how CIDR alleviates address exhaustion and reduces routing table sizes by allowing block addresses to be summarized without regard to classful boundaries. Examples are provided to illustrate route summarization and CIDR.
Unicast involves sending data from one computer to another, with one sender and one receiver. Multicast sends data to a group of devices that have joined the multicast group, with one sender but multiple potential receivers. Broadcast sends data from one computer that is then forwarded to all connected devices, with one sender and all devices receiving the broadcast traffic.
This document discusses subnetting, supernetting, and classless addressing. It provides examples of how to calculate subnet masks, subnet addresses, supernet masks, and address ranges for subnets and supernetworks. It also discusses variable length subnetting and classless inter-domain routing (CIDR) notation.
This document discusses classless addressing and variable-length subnetting. It begins by explaining that in classless addressing, variable-length blocks of IP addresses are assigned without class boundaries. It then provides examples of how to determine the network address, broadcast address, and number of addresses given a classless IP address and prefix length. The document also describes how organizations can create subnets within a granted address block to meet their needs using variable-length subnetting.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Research on collaborative information sharing systemsDavide Eynard
The document discusses research on collaborative information sharing systems and participative systems. Specifically, it discusses using semantics to help organize information contributed by users on collaborative systems like wikis and folksonomies. It proposes using ontologies and semantic annotations on different levels of wiki systems and expanding folksonomies with ontologies to address limitations like lack of hierarchy, precision and recall in folksonomies. Fuzzy set theory is also discussed as a way to describe resources through membership in categories defined by tags to enable more intuitive querying of folksonomies.
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsMathieu d'Aquin
1) The document discusses how knowledge representation and ontologies have evolved from closed knowledge bases for specific domains to open knowledge infrastructures that can handle large amounts of diverse data and information at scale.
2) It provides examples of how ontologies and semantic technologies are being used to build intelligent systems that can search, integrate, and automatically process and analyze large datasets.
3) Going forward, ontologies will play an important role in populating knowledge from data and dialog, enabling the automatic exploitation of data by autonomous agents, and enhancing data analytics and mining through semantic representation of datasets, tools, and policies.
Designing for Collaboration: Challenges & Considerations of Multi-Use Informa...Stephanie Steinhardt
Slides assembled for Human Centered Design & Engineering Preliminary Exam talk at the University of Washington Allen Library Auditorium 4.8.2011.
Thanks to Mark Zachry, David McDonald, Elly Searle, Carol Allen, and NSF IIS-0811210.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Obama's 2012 reelection campaign leveraged big data analytics to build detailed profiles of potential voters using disparate data sources. They combined this data to create a "single view" of individuals to optimize fundraising, volunteer mobilization, and get-out-the-vote strategies. Predictive modeling was used to score voters by likelihood of donating or voting Democrat. Resources were targeted to persuadable voters in swing states. Regular polling provided insights to track debate impacts and allocate campaign efforts. The campaign's data-driven approach helped achieve record fundraising and turnout in swing states.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
A top-down look at current industry and technology trends for Big Data, Data Analytics and Machine Learning (cognitive technologies, AI etc.). New slides added for Ark Group presentation on 1st December 2016.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
This document discusses leveraging social big data and the evolution from existing rigid operations to predictive analytics using social media. It begins with an overview of handouts and reference materials on big data, Hadoop, Spark, and data science projects. It then discusses areas for conversation around social content, structure and analytics, data science primers and resources, and data science innovation. It presents a roadmap showing the evolution from rigid and siloed operations to being more flexible, connected, adaptive and predictive using social media. Finally, it discusses types of intentionality and how social CRM can integrate social data.
The document discusses how Alation and Trifacta use machine learning to help users understand and prepare data. Alation provides data discovery and cataloging capabilities to help users find, understand, and trust data. Trifacta provides self-service data preparation tools powered by machine learning to help users clean, structure, and validate data. The combination of Alation and Trifacta allows for an open and integrated solution for data wrangling, discovery, and governance.
The document discusses open data sharing and the Research Data Alliance (RDA). RDA aims to build social and technical bridges to enable open sharing of data across disciplines. It has over 3700 members from 110 countries working in 60+ groups. RDA addresses issues like interoperability standards, data citation practices, and workforce training to facilitate greater data access and use. The presentation highlights several RDA working groups focusing on specific domains like wheat research and chemistry. It emphasizes that open problem solving and involving stakeholders are key to making data infrastructure successful.
Cross discipline collaboration benefits from group think, a consolidation of soft system methodology and user focused design that all starts with design thinking that sees clients, designers, developers and information architects working together to address user problems and needs. As with any great adventure, design thinking starts with exploration and discovery.This presentation examines the high level tenants of system thinking, expands the scope of user thinking to include tools and devices that users employ to find out designs and delve into the specifics of design thinking, its methods and outcomes.
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020P2Pvalue
This document summarizes the first year of research by P2Pvalue on peer production and the commons. It mapped over 400 cases across 30 areas of collaborative production involving common resources, open access, and peer-to-peer relationships. Research methods included statistical analysis of over 300 cases, surveys of 250 participants, 20 in-depth case studies, and legal analysis of four cases. Key findings included the diversity of infrastructure models, with most centralized but some moving toward federation, and the development of commons-friendly licenses to protect collaborative production. Overall it was an exciting first year that expanded understanding of peer production and its conditions for "success."
CIDR (Classless Inter-Domain Routing) es un sistema para asignar y manejar direcciones IP de forma más eficiente. CIDR permite el uso de prefijos de red de tamaño variable en lugar de limitarse a las clases tradicionales, agrupando bloques de direcciones existentes. Esto reduce el tamaño de las tablas de ruteo y jerarquiza las rutas entre dominios para una mejor administración de las direcciones IP disponibles a medida que Internet continúa creciendo.
The document discusses the transition from classful networks to classless inter-domain routing (CIDR) networks. CIDR allows for more flexibility in assigning blocks of IP addresses and improves routing efficiency by allowing routes to be aggregated. Valid CIDR blocks must have the host bits set to zero so the address falls on the network boundary. Large blocks are allocated by regional organizations like RIPE and then assigned to ISPs and other organizations in smaller blocks.
This document discusses route summarization and Classless Interdomain Routing (CIDR). It aims to describe how to implement route summarization, calculate route summarization, and explain CIDR implementation. The key topics covered are summarizing routes within an octet, summarizing addresses in a VLSM network, how CIDR alleviates address exhaustion and reduces routing table sizes by allowing block addresses to be summarized without regard to classful boundaries. Examples are provided to illustrate route summarization and CIDR.
Unicast involves sending data from one computer to another, with one sender and one receiver. Multicast sends data to a group of devices that have joined the multicast group, with one sender but multiple potential receivers. Broadcast sends data from one computer that is then forwarded to all connected devices, with one sender and all devices receiving the broadcast traffic.
This document discusses subnetting, supernetting, and classless addressing. It provides examples of how to calculate subnet masks, subnet addresses, supernet masks, and address ranges for subnets and supernetworks. It also discusses variable length subnetting and classless inter-domain routing (CIDR) notation.
This document discusses classless addressing and variable-length subnetting. It begins by explaining that in classless addressing, variable-length blocks of IP addresses are assigned without class boundaries. It then provides examples of how to determine the network address, broadcast address, and number of addresses given a classless IP address and prefix length. The document also describes how organizations can create subnets within a granted address block to meet their needs using variable-length subnetting.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Research on collaborative information sharing systemsDavide Eynard
The document discusses research on collaborative information sharing systems and participative systems. Specifically, it discusses using semantics to help organize information contributed by users on collaborative systems like wikis and folksonomies. It proposes using ontologies and semantic annotations on different levels of wiki systems and expanding folksonomies with ontologies to address limitations like lack of hierarchy, precision and recall in folksonomies. Fuzzy set theory is also discussed as a way to describe resources through membership in categories defined by tags to enable more intuitive querying of folksonomies.
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsMathieu d'Aquin
1) The document discusses how knowledge representation and ontologies have evolved from closed knowledge bases for specific domains to open knowledge infrastructures that can handle large amounts of diverse data and information at scale.
2) It provides examples of how ontologies and semantic technologies are being used to build intelligent systems that can search, integrate, and automatically process and analyze large datasets.
3) Going forward, ontologies will play an important role in populating knowledge from data and dialog, enabling the automatic exploitation of data by autonomous agents, and enhancing data analytics and mining through semantic representation of datasets, tools, and policies.
Designing for Collaboration: Challenges & Considerations of Multi-Use Informa...Stephanie Steinhardt
Slides assembled for Human Centered Design & Engineering Preliminary Exam talk at the University of Washington Allen Library Auditorium 4.8.2011.
Thanks to Mark Zachry, David McDonald, Elly Searle, Carol Allen, and NSF IIS-0811210.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Obama's 2012 reelection campaign leveraged big data analytics to build detailed profiles of potential voters using disparate data sources. They combined this data to create a "single view" of individuals to optimize fundraising, volunteer mobilization, and get-out-the-vote strategies. Predictive modeling was used to score voters by likelihood of donating or voting Democrat. Resources were targeted to persuadable voters in swing states. Regular polling provided insights to track debate impacts and allocate campaign efforts. The campaign's data-driven approach helped achieve record fundraising and turnout in swing states.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
A top-down look at current industry and technology trends for Big Data, Data Analytics and Machine Learning (cognitive technologies, AI etc.). New slides added for Ark Group presentation on 1st December 2016.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
This document discusses leveraging social big data and the evolution from existing rigid operations to predictive analytics using social media. It begins with an overview of handouts and reference materials on big data, Hadoop, Spark, and data science projects. It then discusses areas for conversation around social content, structure and analytics, data science primers and resources, and data science innovation. It presents a roadmap showing the evolution from rigid and siloed operations to being more flexible, connected, adaptive and predictive using social media. Finally, it discusses types of intentionality and how social CRM can integrate social data.
The document discusses how Alation and Trifacta use machine learning to help users understand and prepare data. Alation provides data discovery and cataloging capabilities to help users find, understand, and trust data. Trifacta provides self-service data preparation tools powered by machine learning to help users clean, structure, and validate data. The combination of Alation and Trifacta allows for an open and integrated solution for data wrangling, discovery, and governance.
The document discusses open data sharing and the Research Data Alliance (RDA). RDA aims to build social and technical bridges to enable open sharing of data across disciplines. It has over 3700 members from 110 countries working in 60+ groups. RDA addresses issues like interoperability standards, data citation practices, and workforce training to facilitate greater data access and use. The presentation highlights several RDA working groups focusing on specific domains like wheat research and chemistry. It emphasizes that open problem solving and involving stakeholders are key to making data infrastructure successful.
Cross discipline collaboration benefits from group think, a consolidation of soft system methodology and user focused design that all starts with design thinking that sees clients, designers, developers and information architects working together to address user problems and needs. As with any great adventure, design thinking starts with exploration and discovery.This presentation examines the high level tenants of system thinking, expands the scope of user thinking to include tools and devices that users employ to find out designs and delve into the specifics of design thinking, its methods and outcomes.
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020P2Pvalue
This document summarizes the first year of research by P2Pvalue on peer production and the commons. It mapped over 400 cases across 30 areas of collaborative production involving common resources, open access, and peer-to-peer relationships. Research methods included statistical analysis of over 300 cases, surveys of 250 participants, 20 in-depth case studies, and legal analysis of four cases. Key findings included the diversity of infrastructure models, with most centralized but some moving toward federation, and the development of commons-friendly licenses to protect collaborative production. Overall it was an exciting first year that expanded understanding of peer production and its conditions for "success."
The web of data: how are we doing so far?Elena Simperl
This document summarizes Elena Simperl's presentation on "The web of data: how are we doing so far?". Some key points:
- The web has shaped our understanding and interactions with data in many ways like answering questions, sharing data online, and publishing data for others to use.
- However, the theory and practice of the web of data are different, and we are at a crucial moment in how data is published and used on the web.
- Open data portals need to improve in areas like adopting standards, co-locating documentation, and making data more usable and discoverable in order to increase data reuse.
The web of data: how are we doing so farElena Simperl
The document summarizes the current state of open data and the web of data. It discusses how data is being shared online through datasets, digital traces, and algorithms. While there is a lot of annotated data available, especially about locations and businesses, uptake of linked data and vocabulary reuse is still low. The document also reviews guidelines for improving data organization, discoverability, documentation, and engagement. Finally, it discusses ongoing research on data search behavior, sensemaking practices, and the potential for generative AI to help with data understanding and reuse.
This document provides an introduction to data visualization for analysis. It discusses exploring datasets that can include textual, numerical, and other data. The document outlines the data visualization process and mentions some common tools and methods used. It also discusses extending your toolset and provides an example exercise exploring a dataset and creating a visualization to gain insights. The objective is to appreciate the variety of techniques available to digital humanities scholars for data analysis and visualization.
Claremont Report on Database Research: Research Directions (Le Gruenwald)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Le Gruenwald." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Eric A. Brewer)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Eric A. Brewer". (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Rakesh Agrawal)infoblog
Search and data have a virtuous cycle where more data leads to better search results which provides more data. New search applications in personal health data mining and distributed knowledge creation in education present opportunities for database research. Technologies now better support capturing personal information while cloud computing reduces storage costs. Personal health analytics and data mining raise privacy and customization challenges.
Claremont Report on Database Research: Research Directions (Gerhard Weikum)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Gerhard Weikum." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Beng Chin Ooi)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Beng Chin Ooi." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Yannis E. Ioannidis)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Yannis E. Ioannidis." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Donald Kossmann)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Donald Kossmann." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Johannes Gehrke)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Johannes Gehrke." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Alon Y. Halevy)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Alon Y. Halevy." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
Claremont Report on Database Research: Research Directions (Anastasia Ailamaki)infoblog
This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Anastasia Ailamaki." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)
The document presents SpotSigs, a method for robust and efficient near-duplicate detection in large web collections. SpotSigs extracts signatures from documents using stopword-based n-grams to focus on natural language content. It then clusters similar documents using a self-tuning algorithm while partitioning the collection and pruning the inverted index to improve efficiency. Evaluation on news articles and TREC data shows SpotSigs achieves comparable or better recall than state-of-the-art techniques like shingling and locality-sensitive hashing, with better runtime performance.
Database Research Principles Revealed (Small Size)infoblog
This document summarizes four people who were instrumental to the speaker's research career. It discusses how her manager Laura Haas ensured she could focus on research, her collaborator Stefano Ceri combined details and intuition for success, and colleagues Hector Garcia-Molina and Jeff Ullman who mentored her, co-authored books with her, and supported her both professionally and personally.
The document discusses research principles from Jennifer Widom including choosing research topics by dropping fundamental assumptions, thoroughly developing the data model, query language, and system, and promptly disseminating results through publications and software. It provides examples of tricky semantics in new data models and emphasizes reusing relational semantics when possible and not being secretive with research work.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
CIDR 2009: Jeff Heer Keynote
1. Voyagers and Voyeurs
Supporting Social Data Analysis
Jeffrey Heer
Computer Science Department
Stanford University
CIDR 2009 – Monterey, CA
5 January 2009
4. Observations
Groups spent more time in front of the
visualization than individuals.
Friends encouraged each other to unearth
relationships, probe community boundaries, and
challenge reported information.
Social play resulted in informal analysis, often
driven by story-telling of group histories.
10. Social Data Analysis
Visual sensemaking can be social as
well as cognitive.
Analysis of data coupled with social
interpretation and deliberation.
How can user interfaces catalyze and
support collaborative visual analysis?
13. Voyagers and Voyeurs
Complementary faces of analysis
Voyager – focus on visualized data
Active engagement with the data
Serendipitous comment discovery
Voyeur – focus on comment listings
Investigate others’ explorations
Find people and topics of interest
Catalyze new explorations
22. Social Data Analysis In Action
1. Discussion and Debate
2. Text is Data, Too
3. Data Integrity and Cleaning
4. Integrating Data in Context
5. Pointing and Naming
For each, some thoughts on future directions.
I asked my colleagues: if you could give database
researchers a wish list, what would it be?
33. WANTED: Structured Conversation
Reduce the cost of synthesizing contributions
Can we represent data, visualizations, and social
activity in a unified data model?
35. Visualization Popularity
Service
Many-Eyes Swivel
Tag Cloud
Bubble Graph
Word Tree
Bar Chart
Maps
Network Diagram
Treemap
Matrix Chart
Line Graph
Scatterplot
Stacked Graph
Pie Chart
Histogram
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Percentage Percentage
Over 1/3 of Many-Eyes visualizations use free text
38. WANTED: Better Tools for Text
Statistical Analysis of text (with ties to source!)
Entity Extraction
Aggregation and Comparison of texts
Get a “global” view of documents
We can do better than Tag Clouds (!?)
Use text analysis tools to enable analysis of
structured conversation by the community.
44. Content Analysis of Comments
Service
Sense.us Many-Eyes
Observation
Question
Hypothesis
Data Integrity
Linking
Socializing
System Design
Testing
Tips
To-Do
Affirmation
0 20 40 60 80 0 20 40 60 80
Percentage Percentage
16% of sense.us comments and 10% of Many-Eyes comments
reference data quality or integrity.
45. WANTED: Data Cleaning Tools
Reshape data, reformat rows & columns
Handle missing data: label, repair, interpolate
Entity resolution and de-duplication
Group related values into aggregates
Assist table lookups & data transforms
Provide tools in situ to leverage collective
Transparency requires provenance
53. WANTED: In-Situ Data Integration
Search for and suggest related data or views
User input for types, schema matching, or data
Apply in context of the current task
But record mappings for future use
Record provenance: chain of data sources
Examples: Google Web Tables, Pay-As-You-Go,
Stanford Vispedia, Utah VisTrails
59. Visual Queries
Model selections as declarative queries over
interface elements or underlying data
(-118.371 ≤ lon AND lon ≤ -118.164) AND (33.915 ≤ lat AND lat ≤ 34.089)
60. Visual Queries
Model selections as declarative queries over
interface elements or underlying data
Applicable to dynamic, time-varying data
Retarget selection across visual encodings
Support social navigation and data mining
61. WANTED: Data-Aware Annotation
Meta-queries linking annotations to views
Visually specifying notification triggers
Annotating data aggregates (use lineage?)
Unified model (again!) to facilitate reference
How to make it work at scale?
How else to use machine-readable annotations?
Can annotations be used to steer data mining?
63. Social Data Analysis
Collective analysis of data supported
by social interaction.
1. Discussion and Debate
2. Text is Data, Too
3. Data Integrity and Cleaning
4. Integrating Data in Context
5. Pointing and Naming
64. Summary
As visualization becomes common on the web,
opportunities for collaborative analysis abound.
Weave visualizations into the web: data access,
visualization creation, view sharing and pointing.
Support discovery, discussion, and integration
of contributions to leverage the collective.
Improve both processes and technologies for
communication and dissemination.
65. Parting Thoughts
Visualizations may have a catalytic effect
on social interaction around data.
Encourage participation by minimizing or
offsetting interaction costs.
Provide incentives by fostering the
personal relevance of the data.
66. Acknowledgements
@ Berkeley: Maneesh Agrawala, Wes Willett,
danah boyd, Marti Hearst, Joe Hellerstein
@ IBM: Martin Wattenberg, Fernanda Viégas
@ PARC: Stu Card
@ Tableau: Jock Mackinlay, Chris Stolte,
Christian Chabot
68. With a collaborative spirit, with a collaborative platform
where people can upload data, explore data, compare
solutions, discuss the results, build consensus, we can
engage passionate people, local communities, media and
this will raise - incredibly - the amount of people who can
understand what is going on.
And this would have fantastic outcomes: the engagement of
people, especially new generations; it would increase
knowledge, unlock statistics, improve transparency and
accountability of public policies, change culture, increase
numeracy, and in the end, improve democracy and welfare.
Enrico Giovannini, Chief Statistician, OECD. June 2007.