Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on May 12, 2015, at the 'SAS Forum Switzerland' in Zurich, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's introductory view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on May 13, 2014, at the `SMi Big Data in Pharma' conference in London, United Kingdom.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on July 2, 2015, at 'Swiss Re' in Adliswil, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 21, 2014, at the 'Research Seminar in Statistics' of the University of Geneva in Geneva, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 13, 2014, at `F. Hoffmann-La Roche' in Basel, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as the pharmaceutical industry. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms in pharmaceutical development, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)Prof. Dr. Diego Kuonen
Keynote talk given by Dr. Diego Kuonen, CStat PStat CSci, on October 21, 2015, at the `Austrian Statistics Days 2015' in Vienna, Austria.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on August 26, 2014, at the `Zurich Machine Learning and Data Science' meetup in Zurich, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...Prof. Dr. Diego Kuonen
'President's Invited Speaker' keynote talk given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on August 22, 2016, at the '37th Annual Conference of the International Society for Clinical Biostatistics (ISCB)' in Birmingham, United Kingdom.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as pharmaceutical development. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' overview of these terms in pharmaceutical development, illustrates the connection between data science and statistics - the terms surrounding the 'sexiest job of the 21st century' - and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on November 29, 2016, at the `University of Applied Sciences of Western Switzerland' (`Haute Ecole d'Ingénierie et de Gestion du Canton de Vaud', HEIG-VD) in Yverdon-les-Bains, Switzerland.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on May 13, 2014, at the `SMi Big Data in Pharma' conference in London, United Kingdom.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on July 2, 2015, at 'Swiss Re' in Adliswil, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 21, 2014, at the 'Research Seminar in Statistics' of the University of Geneva in Geneva, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 13, 2014, at `F. Hoffmann-La Roche' in Basel, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as the pharmaceutical industry. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms in pharmaceutical development, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)Prof. Dr. Diego Kuonen
Keynote talk given by Dr. Diego Kuonen, CStat PStat CSci, on October 21, 2015, at the `Austrian Statistics Days 2015' in Vienna, Austria.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on August 26, 2014, at the `Zurich Machine Learning and Data Science' meetup in Zurich, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...Prof. Dr. Diego Kuonen
'President's Invited Speaker' keynote talk given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on August 22, 2016, at the '37th Annual Conference of the International Society for Clinical Biostatistics (ISCB)' in Birmingham, United Kingdom.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as pharmaceutical development. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' overview of these terms in pharmaceutical development, illustrates the connection between data science and statistics - the terms surrounding the 'sexiest job of the 21st century' - and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on November 29, 2016, at the `University of Applied Sciences of Western Switzerland' (`Haute Ecole d'Ingénierie et de Gestion du Canton de Vaud', HEIG-VD) in Yverdon-les-Bains, Switzerland.
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 20, 2015, at the Swiss Statistical Society's celebration of the `World Statistics Day 2015' in Olten, Switzerland.
Further information are available at https://worldstatisticsday.org/blog.html?c=CHE
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on November 25, 2016, at the `Statistics at Nestlé in Switzerland' event of `Nestlé' in Vevey, Switzerland.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on April 23, 2015, at the 'ZüKoSt: Seminar on Applied Statistics' of the ETH Zurich in Zurich, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 1, 2015, at the `Joint SCITAS and Statistics Seminar' of the EPFL in Lausanne, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on February 1, 2017, at the `Microsoft Vision Days - Intelligent Cloud' event of Microsoft Switzerland in Wallisellen, Switzerland.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data as the Fuel and Analytics as the Engine of the Digital TransformationProf. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 13, 2017, at the `Information Builders Think Tank Lunch' in Zurich, Switzerland.
Data as Fuel and Analytics as Engine of the Digital Transformation: Demystic...Prof. Dr. Diego Kuonen
Invited presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on September 19, 2017, at the "ITU-Academia Partnership Meeting: Developing Skills for the Digital Era" in Budapest, Hungary.
See https://www.itu.int/en/ITU-D/Capacity-Building/Pages/events/academia2017.aspx
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on April 18, 2016, at the `Nestlé Institute of Health Sciences' in Lausanne, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as health sciences. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' view on these terms in health sciences, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 14, 2017 at Eurostat's international conference `New Techniques and Technologies for Statistics (NTTS) 2017' in Brussels, Belgium.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...Prof. Dr. Diego Kuonen
Invited presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on July 18, 2017 within Eurostat's special topic session `STS021: From Big Data to Smart Statistics' at the `61st ISI World Statistics Congress' (ISI2017) in Marrakech, Morocco.
(Big) Data as the Fuel and Analytics as the Engine of the Digital TransformationProf. Dr. Diego Kuonen
Webinar presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 15, 2018, within the TIBCO webinar entitled "Demystifying the Hype: [Big] Data as Fuel and Analytics as the Engine of Digital Transformation"; see
https://www.tibco.com/events/demystifying-hype-big-data-fuel-and-analytics-engine-digital-transformation
---------------
ABSTRACT
---------------
The digital revolution is truly underway: terms such as big data, cloud, internet of things, internet of everything, the fourth industrial revolution, smart cities and data economy are no longer just concepts - they are changing our lives in new and exciting ways.
Digital Transformation started with a first wave of digitalisation, which resulted in the (big) data revolution. But now a second wave of digitalisation is needed to enable learning from (big) data and to generate increased value for both business and society as a whole.
This presentation discusses how analytics, the science of "learning from data" or of "making sense out of data", becomes the engine of a new wave of Digital Transformation, and illustrates that the biggest challenge therein is the veracity of the "data pedigree", i.e. the trustworthiness of the data, including the reliability, capability, validity, and related quality of the data.
This presentation looks at demystifying concepts and terms surrounding Digital Transformation and big data. Along with machine intelligence and learning, the connection between data science and statistics is illustrated, and trends, challenges, opportunities, and the related digital skills and principles needed to succeed at Digital Transformation are highlighted.
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...Prof. Dr. Diego Kuonen
Public keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on July 7, 2017, in the context of the `CAS Data Visualization' of the `Bern University of the Arts' (HKB) in Berne, Switzerland.
See https://www.hkb.bfh.ch/de/weiterbildung/design/cas-data-visualization and http://bka.ch/worte/rubriken/worte/kein-datensalat
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on February 28, 2019 at the "Swiss Cyber Security Days 2019" on February 27-28, 2019 in Fribourg, Switzerland; see https://swisscybersecuritydays.ch/.
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...Prof. Dr. Diego Kuonen
Webinar presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on May 14, 2019, within the StatSoft webinar entitled "Data as the Fuel and Analytics as the Engine of the Digital Transformation - Demystication, Challenges, Opportunities and
Principles for Success"; see https://www.statsoft.de/en/dates/webinars/
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 26, 2018 at TIBCO's "Data Innovation Event" in Zurich, Switzerland; see https://www.tibco.com/events/tibco-data-innovation-event
Production Processes of Official Statistics & Data Innovation Processes Augme...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on May 15, 2018 at the conference "Big Data for European Statistics (BDES)" in Sofia, Bulgaria; see
https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/BDES_2018
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 7, 2018 at the "5th Swiss Conference on Data Science (SDS|2018)" in Berne, Switzerland; see https://sds2018.ch/.
These slides give a data science view of my MIT Ph.D. Thesis work, including a description of the data, the preprocessing/imaging methods, and the interpretation of the resulting seismic images in terms of subduction zone structure. The data preprocessing employs a novel unsupervised machine learning pipeline that computes high accuracy (and precision) receiver functions by iteratively applying multichannel cross-correlation, principal component analysis, and frequency-domain deconvolution.
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 20, 2015, at the Swiss Statistical Society's celebration of the `World Statistics Day 2015' in Olten, Switzerland.
Further information are available at https://worldstatisticsday.org/blog.html?c=CHE
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on November 25, 2016, at the `Statistics at Nestlé in Switzerland' event of `Nestlé' in Vevey, Switzerland.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on April 23, 2015, at the 'ZüKoSt: Seminar on Applied Statistics' of the ETH Zurich in Zurich, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 1, 2015, at the `Joint SCITAS and Statistics Seminar' of the EPFL in Lausanne, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on February 1, 2017, at the `Microsoft Vision Days - Intelligent Cloud' event of Microsoft Switzerland in Wallisellen, Switzerland.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data as the Fuel and Analytics as the Engine of the Digital TransformationProf. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 13, 2017, at the `Information Builders Think Tank Lunch' in Zurich, Switzerland.
Data as Fuel and Analytics as Engine of the Digital Transformation: Demystic...Prof. Dr. Diego Kuonen
Invited presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on September 19, 2017, at the "ITU-Academia Partnership Meeting: Developing Skills for the Digital Era" in Budapest, Hungary.
See https://www.itu.int/en/ITU-D/Capacity-Building/Pages/events/academia2017.aspx
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...Prof. Dr. Diego Kuonen
Presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on April 18, 2016, at the `Nestlé Institute of Health Sciences' in Lausanne, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors, as well as health sciences. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional Swiss statistician's 'big tent' view on these terms in health sciences, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 14, 2017 at Eurostat's international conference `New Techniques and Technologies for Statistics (NTTS) 2017' in Brussels, Belgium.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...Prof. Dr. Diego Kuonen
Invited presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on July 18, 2017 within Eurostat's special topic session `STS021: From Big Data to Smart Statistics' at the `61st ISI World Statistics Congress' (ISI2017) in Marrakech, Morocco.
(Big) Data as the Fuel and Analytics as the Engine of the Digital TransformationProf. Dr. Diego Kuonen
Webinar presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 15, 2018, within the TIBCO webinar entitled "Demystifying the Hype: [Big] Data as Fuel and Analytics as the Engine of Digital Transformation"; see
https://www.tibco.com/events/demystifying-hype-big-data-fuel-and-analytics-engine-digital-transformation
---------------
ABSTRACT
---------------
The digital revolution is truly underway: terms such as big data, cloud, internet of things, internet of everything, the fourth industrial revolution, smart cities and data economy are no longer just concepts - they are changing our lives in new and exciting ways.
Digital Transformation started with a first wave of digitalisation, which resulted in the (big) data revolution. But now a second wave of digitalisation is needed to enable learning from (big) data and to generate increased value for both business and society as a whole.
This presentation discusses how analytics, the science of "learning from data" or of "making sense out of data", becomes the engine of a new wave of Digital Transformation, and illustrates that the biggest challenge therein is the veracity of the "data pedigree", i.e. the trustworthiness of the data, including the reliability, capability, validity, and related quality of the data.
This presentation looks at demystifying concepts and terms surrounding Digital Transformation and big data. Along with machine intelligence and learning, the connection between data science and statistics is illustrated, and trends, challenges, opportunities, and the related digital skills and principles needed to succeed at Digital Transformation are highlighted.
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...Prof. Dr. Diego Kuonen
Public keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on July 7, 2017, in the context of the `CAS Data Visualization' of the `Bern University of the Arts' (HKB) in Berne, Switzerland.
See https://www.hkb.bfh.ch/de/weiterbildung/design/cas-data-visualization and http://bka.ch/worte/rubriken/worte/kein-datensalat
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on February 28, 2019 at the "Swiss Cyber Security Days 2019" on February 27-28, 2019 in Fribourg, Switzerland; see https://swisscybersecuritydays.ch/.
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...Prof. Dr. Diego Kuonen
Webinar presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on May 14, 2019, within the StatSoft webinar entitled "Data as the Fuel and Analytics as the Engine of the Digital Transformation - Demystication, Challenges, Opportunities and
Principles for Success"; see https://www.statsoft.de/en/dates/webinars/
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 26, 2018 at TIBCO's "Data Innovation Event" in Zurich, Switzerland; see https://www.tibco.com/events/tibco-data-innovation-event
Production Processes of Official Statistics & Data Innovation Processes Augme...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on May 15, 2018 at the conference "Big Data for European Statistics (BDES)" in Sofia, Bulgaria; see
https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/BDES_2018
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Prof. Dr. Diego Kuonen
Keynote speech given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on June 7, 2018 at the "5th Swiss Conference on Data Science (SDS|2018)" in Berne, Switzerland; see https://sds2018.ch/.
These slides give a data science view of my MIT Ph.D. Thesis work, including a description of the data, the preprocessing/imaging methods, and the interpretation of the resulting seismic images in terms of subduction zone structure. The data preprocessing employs a novel unsupervised machine learning pipeline that computes high accuracy (and precision) receiver functions by iteratively applying multichannel cross-correlation, principal component analysis, and frequency-domain deconvolution.
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Azure Stream Analytics (ASA) is an Azure Service that enables real-time insights over streaming data from devices, sensors, infrastructure, and applications. In this presentation, we provide introduction to the service, common use cases, example customer scenarios, business benefits, and demo how to get started. We will quickly build a simple real time analytic application that uses an IoT device to ingest data (Event Hubs), process and analyze data (Stream Analytics) and visualize data (PowerBI).
Cortana Analytics Suite is a fully managed big data and advanced analytics suite that transforms your data into intelligent action. It is comprised of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription. This presentation will cover all the products involved, how they work together, and use cases.
Today, an estimated 8.5% of the Belgian employees are at risk of burnout. This represents a huge potential cost to organizations and a huge personal risk for employees. Together with SD Worx, a large Belgian HR service provider, we succeeded in predicting expected future absenteeism, by combining historical absenteeism with softer measures like evaluations and surveys. At DISummit 2017, we shared the main conclusions of this exciting project, and we offer our view on why we succeeded in this challenge.
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
Some people think data scientists are mythical beings, like unicorns, or they are some sort of nouveau fad that will quickly fade. Not true, says IBM big data evangelist James Kobielus. In this engaging presentation, with artwork created by Angela Tuminello, Kobielus debunks 10 myths about data scientists and their role in analytics and big data. You might also want to read the full blog by Kobielus that spawned this presentation: "Data Scientists: Myths and Mathemagical Superpowers" - http://ibm.co/PqF7Jn
For more information, visit http://www.ibmbigdatahub.com
Keynote on why statisticians should also be online marketers.
Read the full blog post here: http://crocstar.com/blog/2017/sorry-statisticians-but-marketing-is-your-job-too
Einstein published his ideas and became a pivotal element in shifting the way we think about physics - from the Newtonian model to the Quantum - in turn this changed the way we think about the world and allowed us to develop new ways of engaging with the world.
We are at a similar juncture. The development of computational technologies allows us to think about astronomical volumes of data and to make meaning of that data.
The mindshift that occurs is that “the machine is our friend”. The computer, like all machines, extends our capabilities. As a consequence the types of thinking now required in industry are those that get away from thinking like a computer and shift towards creative engagement with possibilities. Logical thinking is still necessary but it starts to be driven by imagination.
Computational thinking and data science change the way we think about defining and solving problems.
The age of creativity - which increasingly extends its impact from arts applications to business, scientific, technological, entrepreneurship, political, and other contexts.
Big data and the dark arts - Jisc Digital Media 2015Jisc
There still remains a certain misunderstanding by the very definition of "big data" and the perceived hype around the term. This workshop clarified the concepts and give examples of relevant big data projects.
Data Science Innovations is a guest lecture for the Advanced Data Analytics (an Introduction) course at the Advanced Analytics Institute at University of Technology Sydney
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Similar to A Statistician's Introductory View on Big Data and Data Science (Version 7) (11)
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
A Statistician's Introductory View on Big Data and Data Science (Version 7)
1. A Statistician’s Introductory View
on Big Data and Data Science
(Version 7 as of 30.04.2015)
Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting
Statistical Consulting + Data Analysis + Data Mining Services
Morgenstrasse 129, 3018 Berne, Switzerland
@DiegoKuonen + kuonen@statoo.com + www.statoo.info
‘SAS Forum Switzerland’, Zurich, Switzerland — May 12, 2015
2. Abstract
There is no question that big data have hit the business, govern-
ment and scientific sectors. The demand for skills in data science
is unprecedented in sectors where value, competitiveness and effi-
ciency are driven by data. However, there is plenty of misleading
hype around the terms ‘big data’ and ‘data science’. This pre-
sentation gives a professional statistician’s introductory view on
these terms, illustrates the connection between data science and
statistics, and highlights some challenges and opportunities from
a statistical perspective.
3. About myself (about.me/DiegoKuonen)
• PhD in Statistics, Swiss Federal Institute of Technology (EPFL), Lausanne,
Switzerland.
• MSc in Mathematics, EPFL, Lausanne, Switzerland.
• CStat (‘Chartered Statistician’), Royal Statistical Society, United Kingdom.
• PStat (‘Accredited Professional Statistician’), American Statistical Association,
United States of America.
• CSci (‘Chartered Scientist’), Science Council, United Kingdom.
• Elected Member, International Statistical Institute, Netherlands.
• CEO & CAO, Statoo Consulting, Switzerland.
• Senior Lecturer in Business Analytics and Statistics, Geneva School of Economics
and Management, University of Geneva, Switzerland.
• President of the Swiss Statistical Society.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
2
5. About Statoo Consulting (www.statoo.info)
• Founded Statoo Consulting in 2001.
• Statoo Consulting is a software-vendor independent Swiss consulting firm spe-
cialised in statistical consulting and training, data analysis, data mining and big
data analytics services.
• Statoo Consulting offers consulting and training in statistical thinking, statistics,
data mining and big data analytics in English, French and German.
Have you ever been Statooed?
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
4
7. Contents
Contents 6
1. Demystifying the ‘big data’ hype 8
2. Demystifying the ‘data science’ hype 15
3. What distinguishes data science from statistics? 32
4. Conclusion and opportunities (not only for statisticians!) 37
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
6
8. ‘Data is arguably the most important natural resource
of this century. ... Big data is big news just about ev-
erywhere you go these days. Here in Texas, everything
is big, so we just call it data.’
Michael Dell, 2014
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
7
9. 1. Demystifying the ‘big data’ hype
• ‘Big data’ have hit the business, government and scientific sectors.
The term ‘big data’ — coined in 1997 by two researchers at the NASA — has
acquired the trappings of religion!
• But, what exactly are ‘big data’?
The term ‘big data’ applies to an accumulation of data that can not be
processed or handled using traditional data management processes or tools.
Big data are a data management architecture!
Most related challenges are IT focused!
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
8
10. • The following characteristics — ‘the four Vs’ or ‘V4
’ — provide a definition:
– ‘Volume’ : ‘data at rest’, i.e. the amount of data ( ‘data explosion problem’),
with respect to the number of observations ( ‘size’ of the data), but also with
respect to the number of variables ( ‘dimensionality’ of the data);
– ‘Variety’ : ‘data in many forms’, ‘mixed data’ or ‘broad data’, i.e. different
types of data (e.g. structured, semi-structured and unstructured, e.g. log files,
text, web or multimedia data such as images, videos, audio), data sources (e.g.
internal, external, open, public), data resolutions and data granularities;
– ‘Velocity’ : ‘data in motion’ or ‘fast data’, i.e. the speed by which data are
generated and need to be handled (e.g. streaming data from machines, sensors
and social data);
– ‘Veracity’ : ‘data in doubt’, i.e. the varying levels of noise and processing errors,
including the reliability, capability and validity of the data.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
9
11. • ‘Volume’ is often the least important issue: it is definitely not a requirement to
have a minimum of a petabyte of data, say.
Bigger challenges are ‘variety’ and ‘velocity’, and most important is ‘veracity’ and
the related quality and correctness of the data.
• The above definition of big data is vulnerable to the criticism of sceptics that these
four Vs have always been there.
Nevertheless, the definition provides a clear and concise ‘business’ framework to
communicate about how to solve different data processing challenges.
But, what is new?
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
10
12. “Big Data’ ... is the simple yet seemingly revolutionary
belief that data are valuable. ... I believe that ‘big’
actually means important (think big deal). Scientists
have long known that data could create new knowledge
but now the rest of the world, including government
and management in particular, has realised that data
can create value.’
Sean Patrick Murphy, 2013
Source: interview with Sean Patrick Murphy, a former senior scientist at Johns Hopkins University
Applied Physics Laboratory, in the Big Data Innovation Magazine, September 2013.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
11
13. ‘To get the full business value from big data, compa-
nies need to focus less on the three Vs of big data
(volume, variety, velocity) and more on the four Ms of
big data: ‘Make Me More Money’!’
Bill Schmarzo, March 2, 2015
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
12
14. ‘The term big data is starting to fall out of fashion,
people are looking for the next big thing, its some sort
of combination of the internet of things and data mon-
etization.’
Thomas H. Davenport, October 22, 2014
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
13
15. ‘Data are not taken for museum purposes; they are
taken as a basis for doing something. If nothing is to
be done with the data, then there is no use in collecting
any. The ultimate purpose of taking data is to pro-
vide a basis for action or a recommendation for action.’
W. Edwards Deming, 1942
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
14
16. 2. Demystifying the ‘data science’ hype
• The demand for ‘data scientists’ — the ‘magicians of the big data era’ — is
unprecedented in sectors where value, competitiveness and efficiency are driven by
data.
• The Data Science Association defined in October 2013 the term ‘data science’
within their Data Science Code of Professional Conduct as follows:
Data science is the scientific study of the creation, validation and trans-
formation of data to create meaning.
Source: www.datascienceassn.org/code-of-conduct.html
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
15
17. • ‘Data-driven decision making’ : the practice of basing decisions on the analysis of
data, rather than purely on intuition:
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
16
18. ‘One by one, the various crises which the world faces
become more obvious and the need for hard facts [facts
by analyzing data] on which to take sensible action be-
comes inescapable.’
George E. P. Box, 1976
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
17
19. • Data science has been dubbed by the Harvard Business Review (Thomas H. Dav-
enport and D. J. Patil, October 2012) as
‘the sexiest job in the 21st century ’
and by the New York Times (April 11, 2013) as a
‘ hot new field [that] promises to revolutionise industries
from business to government, health care to academia’.
But, is data science really new and ‘sexy’?
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
18
20. • The term ‘data science’ was originally coined in 1998 by the statistician Chien-Fu
Jeff Wu when he gave his inaugural lecture at the University of Michigan.
Wu argued that statisticians should be renamed data scientists since they spent
most of their time manipulating and experimenting with data.
• In 2001, the statistician William S. Cleveland introduced the notion of ‘data science’
as an independent discipline.
Cleveland extended the field of statistics to incorporate ‘advances in computing
with data’ in his article ‘Data science: an action plan for expanding the technical
areas of the field of statistics’ (International Statistical Review, 69, 21–26).
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
19
21. • Although the term ‘data scientist’ may be relatively new, this profession has existed
for a long time!
For example, Johannes Kepler published his first two laws of planetary motion,
which describe the motion of planets around the sun, in 1609.
Kepler found them by analysing the astronomical observations of Tycho Brahe.
Kepler was clearly a data scientist!
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
20
22. ‘Data science in a way is much older than Kepler — I
sometimes say that data science is the ‘second oldest’
occupation.’
Gregory Piatetsky-Shapiro, November 26, 2013
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
21
23. ‘I keep saying the sexy job in the next ten years will be
statisticians.’
Hal Varian, 2009
Source: interview with Google’s chief economist in the McKinsey Quarterly, January 2009.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
22
24. ‘And with ongoing advances in high-performance com-
puting and the explosion of data, ... I would venture
to say that statistician could be the sexy job of the
century.’
James (Jim) Goodnight, 2010
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
23
25. ‘I think he [Hal Varian] is behind — using statistics
has been the sexy job of the last 30 years. It has just
taken awhile for organisations to catch on.’
James (Jim) Goodnight, 2011
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
24
26. ‘The demand for competent statisticians who can tease
out the facts by analyzing data, planning investiga-
tions, and developing the necessary new theory and
techniques, will continue to increase.’
George E. P. Box, 1976
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
25
27. • Looking at all the ‘crazy’ hype around the terms ‘big data’ and ‘data science’, it
seems that ‘data scientist’ is just a ‘sexed up’ term for ‘statistician’.
It looks like statisticians just needed a good marketing campaign!
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
26
28. • But, what is ‘statistics’?
Statistics is the science of ‘learning from data’ (or of making sense out
of data), and of measuring, controlling and communicating uncertainty.
It is a process that includes everything from planning for the collection of data and
subsequent data management to end-of-the-line activities such as drawing conclusions
of data and presentation of results.
Uncertainty is measured in units of probability, which is the currency of statistics.
Statistics is concerned with the study of decision making in the face of uncertainty.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
27
29. • However, data science is not just a rebranding of statistics , large-scale statistics
or statistical science!
• Data science is rather a rebranding of ‘data mining’!
But, what is ‘data mining’?
Data mining is the non-trivial process of identifying valid (i.e. being
valid on new data in the face of uncertainty), novel, potentially useful, and
ultimately understandable patterns or structures or models or trends or re-
lationships in data to enable data-driven decision making.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
28
30. The data science (or data mining) Venn diagram
Source: Drew Conway, September 2010 (drewconway.com/zia/2013/3/26/the-data-science-venn-diagram).
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
29
31. ‘The awesome nerds’
Source: Patil, D. J. & Mason, H. (2015). Data Driven. Sebastopol, CA: O’Reilly Media.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
30
32. ‘It is already time to kill the ‘data scientist’ title. ...
The data scientist term has come to mean almost any-
thing.’
Thomas H. Davenport, 2014
Source: Thomas H. Davenport’s article ‘It is already time to kill the ‘data scientist’ title’
in the Wall Street Journal, April 30, 2014.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
31
33. 3. What distinguishes data science from statistics?
• Statistics traditionally is concerned with analysing primary (e.g. experimental) data
that have been collected to explain and check the validity of specific existing ideas
(hypotheses).
Primary data analysis or top-down (explanatory and confirmatory) analysis.
‘Idea (hypothesis) evaluation or testing’ .
• Data science (or data mining), on the other hand, typically is concerned with
analysing secondary (e.g. observational) data that have been collected for other
reasons (and not ‘under control’ of the investigator) to create new ideas (hypotheses).
Secondary data analysis or bottom-up (exploratory and predictive) analysis.
‘Idea (hypothesis) generation’ .
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
32
34. • The two approaches of ‘learning from data’ are complementary and should proceed
side by side — in order to enable proper data-driven decision making.
Example. When historical data are available the idea to be generated from a bottom-
up analysis (e.g. using a mixture of so-called ‘ensemble techniques’) could be
‘which are the most important (from a predictive point of view) factors
(among a ‘large’ list of candidate factors) that impact a given process out-
put (or a given KPI)?’.
Mixed with subject-matter knowledge this idea could result in a list of a ‘small’
number of factors (i.e. ‘the critical ones’).
The confirmatory tools of top-down analysis (statistical ‘Design Of Experiments’,
DOE, in most of the cases) could then be used to confirm and evaluate this idea.
By doing this, new data will be collected (about ‘all’ factors) and a bottom-up
analysis could be applied again — letting the data suggest new ideas to test.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
33
35. ‘Neither exploratory nor confirmatory is sufficient alone.
To try to replace either by the other is madness. We
need them both.’
John W. Tukey, 1980
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
34
36. Data-driven decision making and scientific investigation (Box, 1976)
Source: Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
35
37. ‘Statistics has been the most successful information
science. Those who ignore statistics are condemned
to re-invent it.’
Brad Efron, 1997
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
36
38. 4. Conclusion and opportunities (not only for statisticians!)
• Decision making that was once based on hunches and intuition should be driven by
data ( data-driven decision making).
• Despite an awful lot of marketing hype, big data are here to stay and will impact
every single ‘business’! The ‘age of big data’ could be a golden era for statistics.
• Statistical principles and rigour are necessary to justify the inferential leap from
data to knowledge. At the heart of extracting value from big data lies statistics!
• Lack of expertise in statistics can lead (and has already led) to fundamental errors!
• The key element for a successful (big) data analytics and data science (or data
mining) future are statistical principles and rigour of humans!
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
37
39. ‘We are in the era of big data, and big data needs
statisticians to make sense of it.’
Eric Schmidt and Jonathan Rosenberg, 2014
Source: Eric Schmidt, Google’s chairman and former CEO, and Jonathan Rosenberg, Google’s
former senior vice president of product, in their 2014 book How Google Works
(New York, NY: Grand Central Publishing — HowGoogleWorks.net).
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
38
40. • The challenges from a statistical perspective include
– the need to think really hard about a problem and to understand the underlying
mechanism that drive the processes that generate the data ( ask the ‘right’
question(s)!);
‘Far better an approximate answer to the ‘right’ ques-
tion , which is often vague, than an ‘exact’ answer
to the wrong questions, which can always be made
precise.’
John W. Tukey, 1962
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
39
41. – the provenance of the data, e.g. the quality of the data — including issues like
omissions, data linkage errors, measurement errors, censoring, missing observa-
tions, atypical observations, missing variables ( ‘omitted variable bias’), the
characteristics and heterogeneity of the sample — big data being ‘only’ a sample
(at a particular time) of a population of interest ( ‘sampling/selection bias’,
i.e. is the sample representative to the population it was designed for?);
– the ethics of using and linking (big) data, particularly in relation to personal data,
i.e. ethical issues related to privacy ( ‘information rules’ need to be defined),
confidentiality (of shared private information), transparency (e.g. of data uses
and data users) and identity (i.e. data should not compromise identity);
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
40
42. – the visualisation of the data, e.g. using and developing effective graphical pro-
cedures;
– spurious (false) associations ( ‘coincidence’ increases, i.e. it becomes more
likely, as sample size increases) versus valid causal relationships ( ‘confirmation
bias’);
– the identification of (and the controlling for) confounding factors;
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
41
43. – multiple statistical hypothesis testing for explanation (not prediction!) with tens
of thousands or even millions of tests performed simultaneously, often with com-
plex dependencies between tests (e.g. spatial or temporal dependence), and the
development of further statistically valid methods to solve large-scale simultane-
ous hypothesis testing problems;
– the dimensionality of the data ( ‘curse of dimensionality’, i.e. data become
more ‘sparse’ or spread out as the dimensionality increases), and the related usage
and development of statistically valid strategies for dimensionality reduction, e.g.
using ‘embedded’ variable subset selection methods like ‘ensemble techniques’;
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
42
44. – the validity of generalisation ( avoid ‘overfitting’), and the replicability and
reproducibility of findings;
– the nature of uncertainty (both random and systematic);
– the balance of humans and computers.
‘There’s no better way to lose money really quickly
than through automated analytics. So I think we have
to be very careful to not totally step out of the picture
and let things go awry.’
Thomas H. Davenport, April 30, 2014
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
43
45. ‘The numbers have no way of speaking for themselves.
We speak for them. We imbue them with meaning. ...
Before we demand more of our data, we need to de-
mand more of ourselves.’
Nate Silver, 2012
Source: Silver, N. (2012). The Signal and The Noise: Why Most Predictions Fail but Some Do Not.
New York, NY: The Penguin Press.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
44
46. ‘Most of my life I went to parties and heard a little
groan when people heard what I did. Now they are all
excited to meet me.’
Robert Tibshirani, 2012
Source: interview with Robert Tibshirani, a statistics professor at Stanford University,
in the New York Times, January 26, 2012.
Copyright c 2001–2015, Statoo Consulting, Switzerland. All rights reserved.
45
47. Have you been Statooed?
Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting
Morgenstrasse 129
3018 Berne
Switzerland
email kuonen@statoo.com
@DiegoKuonen
web www.statoo.info
/Statoo.Consulting
48. Copyright c 2001–2015 by Statoo Consulting, Switzerland. All rights reserved.
No part of this presentation may be reprinted, reproduced, stored in, or introduced
into a retrieval system or transmitted, in any form or by any means (electronic, me-
chanical, photocopying, recording, scanning or otherwise), without the prior written
permission of Statoo Consulting, Switzerland.
Warranty: none.
Trademarks: Statoo is a registered trademark of Statoo Consulting, Switzerland.
Other product names, company names, marks, logos and symbols referenced herein
may be trademarks or registered trademarks of their respective owners.
Presentation code: ‘SAS.Forum.May12.2015’.
Typesetting: LATEX, version 2 . PDF producer: pdfTEX, version 3.141592-1.40.3-2.2 (Web2C 7.5.6).
Compilation date: 30.04.2015.