A presentation on the value and the risks of identifying, mining, and visualizing data. All this is described in a big-data-aware setting. The presentation is meant for a wide audience, not requiring deep technical background.
The original presentation was done within the KAS Seminar on Data Journalism in Dec 2017.
In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits. On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome. We present an innovative approach to overcome these limitations using the Stratosphere/Apache Flink big data platform by means of a R package and ready-to-use distributed algorithm.
This solution allows the user, with small modifications in the R code, to easily execute distributed scenarios using popular machine learning techniques. We will cover the implementation details of the proposed solution including the architecture of the system, the functionality implemented and working examples.
In addition, we will cover what are the differences between our approach and other solutions that integrate R with Hadoop or other large-scale analytics systems.
Finally, the results of the performance tests show that this solution is competitive with the already existing R implementations for small amounts of data and able to scale-up to gigabyte level.
How software developers need to manage metadata and data dictionaries to make software integration faster and more cost effective. This presentation is a general overview of the concepts around data semantics for college-level students. This presentation was originally created for a seminar at Carleton College.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
There are 3 pillars of Big Data
1.Structured data
2.Unstructured data
3.Semi structured data
Businesses worldwide construct their empire on these three pillars and capitalize on their limitless potential.
In this presentation, Wes Eldridge will provide a general overview on data science. The talk will cover a variety of topics, Wes will start with the dirty history of the field which will help add context. After learning about the history of data and data science Wes will discuss the common roles a data scientist holds in business and organizations. Next, he will talk about how to use data in your organization and products. Finally, he'll cover some tools to help you get started in data science. After the presentation, Wes will stick around for Q/A and data discussion.
Data sciences is the topnotch in our world now as it enables us to predict the future and behaviors of people and systems alike.
Hence, this course focuses on introducing the processing involved in data sciences.
In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits. On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome. We present an innovative approach to overcome these limitations using the Stratosphere/Apache Flink big data platform by means of a R package and ready-to-use distributed algorithm.
This solution allows the user, with small modifications in the R code, to easily execute distributed scenarios using popular machine learning techniques. We will cover the implementation details of the proposed solution including the architecture of the system, the functionality implemented and working examples.
In addition, we will cover what are the differences between our approach and other solutions that integrate R with Hadoop or other large-scale analytics systems.
Finally, the results of the performance tests show that this solution is competitive with the already existing R implementations for small amounts of data and able to scale-up to gigabyte level.
How software developers need to manage metadata and data dictionaries to make software integration faster and more cost effective. This presentation is a general overview of the concepts around data semantics for college-level students. This presentation was originally created for a seminar at Carleton College.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
There are 3 pillars of Big Data
1.Structured data
2.Unstructured data
3.Semi structured data
Businesses worldwide construct their empire on these three pillars and capitalize on their limitless potential.
In this presentation, Wes Eldridge will provide a general overview on data science. The talk will cover a variety of topics, Wes will start with the dirty history of the field which will help add context. After learning about the history of data and data science Wes will discuss the common roles a data scientist holds in business and organizations. Next, he will talk about how to use data in your organization and products. Finally, he'll cover some tools to help you get started in data science. After the presentation, Wes will stick around for Q/A and data discussion.
Data sciences is the topnotch in our world now as it enables us to predict the future and behaviors of people and systems alike.
Hence, this course focuses on introducing the processing involved in data sciences.
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
Inaugural lecture at Heinrich-Heine-University Düsseldorf on 28 May 2019.
Abstract:
When searching the Web for information, human knowledge and artificial intelligence are in constant interplay. On the one hand, human online interactions such as click streams, crowd-sourced knowledge graphs, semi-structured web markup or distributional semantic models built from billions of Web documents are informing machine learning and information retrieval models, for instance, as part of the Google search engine. On the other hand, the very same search engines help users in finding relevant documents, facts, or data for particular information needs, thereby helping users to gain knowledge. This talk will give an overview of recent work in both of the aforementioned areas. This includes 1) research on mining structured knowledge graphs of factual knowledge, claims and opinions from heterogeneous Web documents as well as 2) recent work in the field of interactive information retrieval, where supervised models are trained to predict the knowledge (gain) of users during Web search sessions in order to personalise rankings. Both streams of research are converging as part of online platforms and applications to facilitate access to data(sets), information and knowledge.
This presentation shows how well FME works with ISO standards in quality control and how an FME Server validation service can be used to support data management in a data distribution workflow.
A brief introduction to Data Quality rule development and implementation covering:
- What are Data Quality Rules.
- Examples of Data Quality Rules.
- What are the benefits of rules.
- How can I create my own rules?
- What alternate approaches are there to building my own rules?
The presentation also includes a very brief overview of our Data Quality Rule services. For more information on this please contact us.
Measure It! How to measure quality in (not only) large software projects, OW2...OW2
You can control what you can measure: this is particularly true for software quality. Measures means attributes (to be measured) and tools (to measure).
We have a pletora of tools to support entirely the Application Lifecycle Management: companies and communities can leverage open source to set up infrastructures filled with SCMs, issue trackers, static analyzers, wikis, planners, and so on. These infras can become mines of raw quality data: selecting and defining measures, metrics and ways to represent them is core to achieve a complete control of the quality of your developments.
In this talk Daniele will show you the experience in a large software company, involved both in open and closed source projects: processes, tools, measures adopted to let people to deliver software satisfying common quality attributes, shared across the organization by the means of guidelines and a common culture of quality. This approach can be adopted by companies and communities as well: quality is the value perceived by end-users, who often aren’t interested in knowing whether the software they are using is built by a community or a company. You will see how close this experience is to OW2 OSCAR quality analysis pillars.
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
Slides for the first meeting of the course 'Big Data and Automated Content Analysis' at the Department of Communication Science, University of Amsterdam
Data Science Innovations is a guest lecture for the Advanced Data Analytics (an Introduction) course at the Advanced Analytics Institute at University of Technology Sydney
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
In the frame of the master in logistic LOG2020, a brief presentation about BigData and its impacts on Supply Chains at IUAV.
Topics and contents have been developed along the research for the MBA final dissertation at MIB School of Management.
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
Inaugural lecture at Heinrich-Heine-University Düsseldorf on 28 May 2019.
Abstract:
When searching the Web for information, human knowledge and artificial intelligence are in constant interplay. On the one hand, human online interactions such as click streams, crowd-sourced knowledge graphs, semi-structured web markup or distributional semantic models built from billions of Web documents are informing machine learning and information retrieval models, for instance, as part of the Google search engine. On the other hand, the very same search engines help users in finding relevant documents, facts, or data for particular information needs, thereby helping users to gain knowledge. This talk will give an overview of recent work in both of the aforementioned areas. This includes 1) research on mining structured knowledge graphs of factual knowledge, claims and opinions from heterogeneous Web documents as well as 2) recent work in the field of interactive information retrieval, where supervised models are trained to predict the knowledge (gain) of users during Web search sessions in order to personalise rankings. Both streams of research are converging as part of online platforms and applications to facilitate access to data(sets), information and knowledge.
This presentation shows how well FME works with ISO standards in quality control and how an FME Server validation service can be used to support data management in a data distribution workflow.
A brief introduction to Data Quality rule development and implementation covering:
- What are Data Quality Rules.
- Examples of Data Quality Rules.
- What are the benefits of rules.
- How can I create my own rules?
- What alternate approaches are there to building my own rules?
The presentation also includes a very brief overview of our Data Quality Rule services. For more information on this please contact us.
Measure It! How to measure quality in (not only) large software projects, OW2...OW2
You can control what you can measure: this is particularly true for software quality. Measures means attributes (to be measured) and tools (to measure).
We have a pletora of tools to support entirely the Application Lifecycle Management: companies and communities can leverage open source to set up infrastructures filled with SCMs, issue trackers, static analyzers, wikis, planners, and so on. These infras can become mines of raw quality data: selecting and defining measures, metrics and ways to represent them is core to achieve a complete control of the quality of your developments.
In this talk Daniele will show you the experience in a large software company, involved both in open and closed source projects: processes, tools, measures adopted to let people to deliver software satisfying common quality attributes, shared across the organization by the means of guidelines and a common culture of quality. This approach can be adopted by companies and communities as well: quality is the value perceived by end-users, who often aren’t interested in knowing whether the software they are using is built by a community or a company. You will see how close this experience is to OW2 OSCAR quality analysis pillars.
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
Slides for the first meeting of the course 'Big Data and Automated Content Analysis' at the Department of Communication Science, University of Amsterdam
Data Science Innovations is a guest lecture for the Advanced Data Analytics (an Introduction) course at the Advanced Analytics Institute at University of Technology Sydney
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
In the frame of the master in logistic LOG2020, a brief presentation about BigData and its impacts on Supply Chains at IUAV.
Topics and contents have been developed along the research for the MBA final dissertation at MIB School of Management.
This white paper was produced by a group of activists, researchers and data experts who met at The Rockefeller Foundation’s Bellagio Centre to discuss the question of whether, and how, big data is becoming a resource for positive social change in low- and middle-income countries (LMICs). Our working definition of big data includes, but is not limited to, sources such as social media, mobile phone use, digitally mediated transactions, the online news media, and administrative records. It can be categorised as data that is provided explicitly (e.g. social media feedback); data that is observed (e.g. mobile phone call records); and data that is inferred and derived by algorithms (for example social network structure or inflation rates). We defined four main areas where big data has potential for those interested in promoting positive social change: advocating and facilitating; describing and predicting; facilitating information exchange and promoting accountability and transparency.
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...chris wiggins
Data-empowered algorithms are reshaping our professional, personal, and political realities.
However, existing curricula are predominantly designed either for future technologists, focusing on functional capabilities; or for future humanists, focusing on critical and rhetorical context surrounding data.
"Data: Past, Present, and Future" is a new course at Columbia which seeks to define a curriculum at present taught to neither group, yet of interest and utility to future statisticians, CEOs, and senators alike.
The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
The weekly cadence of the course pairs primary and secondary readings with Jupyter notebooks in Python, engaging directly with the data and intellectual advances under study.
Throughout, these intellectual technical advances are paired with critical inquiry into the forces which encouraged and benefited from these new capabilities, i.e., the political dimension of data and technology.
Syllabus, Jupyter notebooks, and additional info can be found via https://data-ppf.github.io/
"Data: Past, Present, and Future" is supported by the Columbia University Collaboratory Fellows Fund. Jointly founded by Columbia University’s Data Science Institute and Columbia Entrepreneurship, The Collaboratory@Columbia is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
3. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are we here?
Learn the basics of data
Understand value and risks of using data
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
4. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are we here?
Learn the basics of data
Understand value and risks of using data
Avoid pitfalls when reporting data-based
findings
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
5. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are we here?
Learn the basics of data
Understand value and risks of using data
Avoid pitfalls when reporting data-based
findings
Identify the role of the journalist with respect
to data
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
7. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why can I help?
Not a journalist
Not a visualization expert
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
8. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why can I help?
Not a journalist
Not a visualization expert
Data mining over 10 years
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
9. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why can I help?
Not a journalist
Not a visualization expert
Data mining over 10 years
Communicating findings over 10 years
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
10. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why can I help?
Not a journalist
Not a visualization expert
Data mining over 10 years
Communicating findings over 10 years
Reviewing over 10 years
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
12. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data and friends
Data measured or calculated value of
property
See also: Βλαχάβας et al. 2002, Chapter 3, Boisot and Canals 2004, Tuomi 1999
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
13. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data and friends
Data measured or calculated value of
property
Information data extended with interpretation
See also: Βλαχάβας et al. 2002, Chapter 3, Boisot and Canals 2004, Tuomi 1999
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
14. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data and friends
Data measured or calculated value of
property
Information data extended with interpretation
Knowledge contextualized, interpreted, validated
data, supporting action (e.g. prediction)
See also: Βλαχάβας et al. 2002, Chapter 3, Boisot and Canals 2004, Tuomi 1999
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
15. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Kinds of data (based on structure)
Unstructured free text, blog entries
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
16. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Kinds of data (based on structure)
Unstructured free text, blog entries
Fully structured databases, data tables, spreadsheets
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
17. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Kinds of data (based on structure)
Unstructured free text, blog entries
Fully structured databases, data tables, spreadsheets
Semi-structured eXtensible Markup Language (XML), Resource
Description Framework (RDF)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
18. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Where can we find data?
Finance index values, yearly reports
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
19. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Where can we find data?
Finance index values, yearly reports
Health health record, lab measurements
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
20. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Where can we find data?
Finance index values, yearly reports
Health health record, lab measurements
More? What do you think?
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
21. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Do we generate data?
Social media posts, comments
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
22. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Do we generate data?
Social media posts, comments
Mobile device use position, usage of applications
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
23. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Do we generate data?
Social media posts, comments
Mobile device use position, usage of applications
More?
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
24. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Who uses data?
State policy making, international relations,
public image
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
25. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Who uses data?
State policy making, international relations,
public image
Businesses marketing, due diligence, risk
assessment, business policy
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
26. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Who uses data?
State policy making, international relations,
public image
Businesses marketing, due diligence, risk
assessment, business policy
Researchers build hypotheses, confirm findings,
discover relations
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
27. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Who uses data?
State policy making, international relations,
public image
Businesses marketing, due diligence, risk
assessment, business policy
Researchers build hypotheses, confirm findings,
discover relations
Citizens understand everyday changes,
understand choices, plan ahead
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
28. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
29. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
Common basis for discussion
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
30. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
Common basis for discussion
Measurable
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
31. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
Common basis for discussion
Measurable
Considered unbiased
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
32. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
Common basis for discussion
Measurable
Considered unbiased
Considered verifiable
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
33. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Why are data useful?
Problem indicators (e.g. deviations)
Common basis for discussion
Measurable
Considered unbiased
Considered verifiable
Oftentimes interlinkable
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
34. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Let us summarize
Data can be the basis for knowledge
Data have various levels of structure
All generate and consume data
Data can have value
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
35. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data flavours
Big data Large scale, volatile, challenging to follow
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
36. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data flavours
Big data Large scale, volatile, challenging to follow
Linked data Inherent interconnectivity
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
37. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Data flavours
Big data Large scale, volatile, challenging to follow
Linked data Inherent interconnectivity
Open data Available in a meaningful manner
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
38. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Big Data?
Data in a new scale...and more...
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
39. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Big Data
Social media posts per day: Gbytes per day on Twitter (50 million
tweets / day)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
40. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Big Data
Social media posts per day: Gbytes per day on Twitter (50 million
tweets / day)
Wind turbine sensors: Tbytes per day per turbine (KHz sampling
of values)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
41. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Big Data
Social media posts per day: Gbytes per day on Twitter (50 million
tweets / day)
Wind turbine sensors: Tbytes per day per turbine (KHz sampling
of values)
Bioinformatics data: Tbytes per day per machine (up to 3 billion
reads per run)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
42. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Big Data
Social media posts per day: Gbytes per day on Twitter (50 million
tweets / day)
Wind turbine sensors: Tbytes per day per turbine (KHz sampling
of values)
Bioinformatics data: Tbytes per day per machine (up to 3 billion
reads per run)
Overall: 2.5 exabytes of data generated per day (and going up)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
44. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Features of Big Data
Volume size
Velocity speed of change
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
45. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Features of Big Data
Volume size
Velocity speed of change
Variety variety and difficulty to read and combine
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
46. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Features of Big Data
Volume size
Velocity speed of change
Variety variety and difficulty to read and combine
Veracity uncertainty and difficulty of validation
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
47. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Features of Big Data
Volume size
Velocity speed of change
Variety variety and difficulty to read and combine
Veracity uncertainty and difficulty of validation
Value difficulty to turn into actionable knowledge
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
48. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Linked Data?
Small statements
<George, loves, Georgia>
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
49. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Linked Data?
Small statements
Fragments of knowledge
<Georgia,listensTo,folkRock>
<Dylan,plays,folkRock>
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
50. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Linked Data?
Small statements
Fragments of knowledge
Powerful when combined
Suggest to George a Dylan CD as
a gift for Georgia
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
51. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Linked Data
Web annotations hidden in web pages, blogs, social media (e.g.
Facebook Graph)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
52. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Linked Data
Web annotations hidden in web pages, blogs, social media (e.g.
Facebook Graph)
DBPedia linked-data view of Wikipedia
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
53. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Linked Data
Web annotations hidden in web pages, blogs, social media (e.g.
Facebook Graph)
DBPedia linked-data view of Wikipedia
Nano-publications scientific results and data as statements
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
54. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Features of Linked Data
Usually automatically generated
Computer-friendly
No meaning
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
56. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Open Data?
Openly available data
...usable
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
57. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Open Data?
Openly available data
...usable
...reusable under well-specified licences
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
58. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
What is Open Data?
Openly available data
...usable
...reusable under well-specified licences
Usually public data
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
59. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Examples of Open Data
Country-wide statistics and demographics
Geo-locations of monuments
Public consultation results
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
60. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Let us summarize
Big data: many, volatile, difficult to handle
Linked data: fragments of knowledge, computer
friendly
Open data: publicly available, reusable
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
61. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
Your Data Stories
Easily consume data
Combine and integrate data
Empower with social media
feedback
yourdatastories.eu
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
62. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
DemocracIT
Build on consultations
Quantify problems and biases
Crowdsourcing
democracit.org
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
64. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
More to come?
Support editing in real-time
Find related data
Suggest visualization
Stay tuned.
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
65. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Motivation
What is data?
From data to big, linked, open data
A potential future: Nice projects and future tools
An informal break
Teaser: Privacy, Security, Anonymity and Data
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
66. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
What data can do
Data contain stories
Data support stories
Data verify stories
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
67. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 1: Money trails
Find Mr Smith payments (public money) from Diavgeia
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
68. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 1: Money trails
Find Mr Smith payments (public money) from Diavgeia
Claim that researchers get too much money!
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
69. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 1: Risks
Source idiosyncracies: Diavgeia problems
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
70. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 1: Risks
Source idiosyncracies: Diavgeia problems
Limited (?) documentation
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
71. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 1: Risks
Source idiosyncracies: Diavgeia problems
Limited (?) documentation
Lack of cross-check method
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
72. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 2: Leaked documents
Thousands of leaked documents from a government
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
73. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 2: Leaked documents
Thousands of leaked documents from a government
Apply sentiment analysis to understand sentiment
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
74. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 2: Leaked documents
Thousands of leaked documents from a government
Apply sentiment analysis to understand sentiment
Show that all is gloomy!
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
75. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 2: Risks
Tool idiosyncracies: sentiment analysis approx. 75% accurate
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
76. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Case 2: Risks
Tool idiosyncracies: sentiment analysis approx. 75% accurate
Lack of cross-check method
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
77. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Data analysis
know what you look for
understand what you read
use your intuition (but not too much!)
double check (through a different path!)
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
78. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
The value of data
Opportunities and risks in using data
Data analysis and visualization: mistakes and pitfalls
Visualization: keep it simple
keep it simple
keep it focused
verify
document and support (through references)
tell a story
think of your audience
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
81. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Why you shouldn’t trust scientists (too much)
Hiding Bias
Reporting research
Self-fulfilled prophecy
A leader claims that the market will go down in a day
The market will go down
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
83. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Why you shouldn’t trust scientists (too much)
Hiding Bias
Reporting research
Beware of the index
An index of poverty
...based on the percentage of people that take less than 5% of the
maximum income
Is it a good idea?
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
92. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Why you shouldn’t trust scientists (too much)
Hiding Bias
Reporting research
What we should look for
Experimental setting
Confidence vs statistical confidence
Exact conclusion
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
93. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Why you shouldn’t trust scientists (too much)
Hiding Bias
Reporting research
What we should look for
Experimental setting
Confidence vs statistical confidence
Exact conclusion
Funding sources
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
94. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
Why you shouldn’t trust scientists (too much)
Hiding Bias
Reporting research
What we should look for
Experimental setting
Confidence vs statistical confidence
Exact conclusion
Funding sources
Confirmation by experts
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
96. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Locate sources
https://www.diavgeia.gov.gr/
http://data.gov.gr/
http://geodata.gov.gr/
http://open-data.okfn.gr/
https://delicious.com/
http://www.linkedopendata.gr/
https://data.europa.eu/euodp/
https://www.constituteproject.org/
http://www.findthedata.com/
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
103. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Who should I work with?
Visualization experts
Data mining experts
Domain experts
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
104. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Who should I work with?
Visualization experts
Data mining experts
Domain experts
Developers... with a twist
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
106. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
107. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
...but they cannot tell the story
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
108. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
...but they cannot tell the story
Analysis of data needs experts
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
109. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
...but they cannot tell the story
Analysis of data needs experts
Visualization of data needs experts
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
110. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
...but they cannot tell the story
Analysis of data needs experts
Visualization of data needs experts
Journalism needs experts
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
111. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Data rules, while data rule
Data is useful, but can be misinterpreted
Data contain, support, confirm stories
...but they cannot tell the story
Analysis of data needs experts
Visualization of data needs experts
Journalism needs experts
Working with data needs a team
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
112. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
How can I apply data analysis (if I am not an expert)?
Summary: things to remember
Thank you
Data rules, while data rule
George Giannakopoulos 1,2
(ggianna@iit.demokritos.gr)
1SKEL Lab, NCSR “Demokritos”, Greece
2SciFY PNPC, Greece
2016
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule
113. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Data: a new era of journalism (?)
Lying with data
Summary and conclusion
References
References I
Boisot, Max and Agustí Canals (2004). “Data, information and
knowledge: have we got it right?” In: Journal of Evolutionary
Economics 14.1, pp. 43–67. : 0936-9937, 1432-1386. :
10.1007/s00191-003-0181-9 (cit. on pp. 12–14).
Tuomi, I. (1999). “Data is more than knowledge: implications of the
reversed knowledge hierarchy for knowledge management and
organizational memory”. In: IEEE Comput. Soc, p. 12. :
978-0-7695-0001-0. : 10.1109/HICSS.1999.772795. :
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.
htm?arnumber=772795 (cit. on pp. 12–14).
Βλαχάβας, Ι. et al. (2002). Τεχνητή Νοημοσύνη. Εκδόσεις
Γαρταγάνη, Θεσσαλονίκη (cit. on pp. 12–14).
George Giannakopoulos (ggianna@iit.demokritos.gr) Data rules, while data rule