Big Data Analytics : Understanding for Research ActivityAndry Alamsyah
Big Data Analytics Presentation at International Workshop Colloquium Exploring Research Opportunity. School of Business and Management (SBM) - ITB. Bandung, 8 August 2019.
This video includes:
Purpose of Data Science, Role of Data Scientist, Skills required for Data Scientist, Job roles for Data Scientist, Applications of Data Science, Career in Data Science.
Wanted to explore more details from the industry experts of big data hadoop training in Bangalore, then get the clear evolution & history of this technology from these big data hadoop training in Bangalore providers.
Computer science is faced with many challenges as the digital universe expands. From mobile and cloud computing to data security, addressing these issues can require large, structural changes, but an examination of these problems can lead to organizational solutions and improvements in the world.
Big Data Analytics : Understanding for Research ActivityAndry Alamsyah
Big Data Analytics Presentation at International Workshop Colloquium Exploring Research Opportunity. School of Business and Management (SBM) - ITB. Bandung, 8 August 2019.
This video includes:
Purpose of Data Science, Role of Data Scientist, Skills required for Data Scientist, Job roles for Data Scientist, Applications of Data Science, Career in Data Science.
Wanted to explore more details from the industry experts of big data hadoop training in Bangalore, then get the clear evolution & history of this technology from these big data hadoop training in Bangalore providers.
Computer science is faced with many challenges as the digital universe expands. From mobile and cloud computing to data security, addressing these issues can require large, structural changes, but an examination of these problems can lead to organizational solutions and improvements in the world.
Take a look at this article with a list of topics for your technology capstone project, we prepared and gathered them together 50 topics , if you need to find more topics just visit site http://www.capstonepaper.net/our-capstone-papers/capstone-technology-paper-writing-services/
A Review of Big data for Social Policy Decision Making Ridi Fe
This presentation is presented on ICSC 2018 at University Club - Universitas Gadjah Mada. it discusses how big data and cloud computing enhance social policy decision making. this research is supported by Microsoft Innovation Center UGM
Data science and visualization lab presentationiHub Research
The Data Science and Visualization Lab! This product is based on a component of research that delves into and innovates on the processes of data science – collection, storage/management, analysis and visualization. You have probably come across one of our amazing info-graphics. What else can you do with data?
Open Data Analytical Model for Human Development Index to Support Government ...Andry Alamsyah
The transparency nature of Open Data is beneficial for citizens to evaluate government work performance. In Indonesia, each government bodies or ministry have their own standard operation procedure on data treatment resulting in incoherent information between agent and likely to miss valuable insight. Therefore, our motivation is to show the advantage of Open Data movement to support unified government decision making. We use dataset from data.go.id which publish official data from each government bodies. The idea is by using those official but limited data, we can find important pattern. The case study is on Human Development Index value prediction and its clustered nature. We explore the data pattern using two important data analytics methods classification and clustering procedure. Data analytics is the collection of activities to reveal unknown data pattern. Specifically, we use Artificial Neural Network classification and K-means clustering. The classification objective is to categorize different level of Human Development Index of cities or region in Indonesia based on Gross Domestic Product, Number of Population in Poverty, Number of Internet User, Number of Labors and Number of Population indicators data. We determined which city belongs to four categories of Human Development stated by UNDP standard. The clustering objective is to find the group characteristics between Human Development Index and Gross Domestic Product.
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Slide 2: Etymology: The etymology of the term ‘Big Data’ can be traced back to the mid-1990s, when it was first used by John Mashey to refer to handling and analysis of massive datasets. However, by 2013, ‘Big Data’ was already being declared obsolescent as a meaningful term by some, as it was too wide ranging and vague in definition (e.g. de Goes, 2013).
Side 6: Vagaries: Kitchin argues that it is velocity and these additional key characteristics that set Big Data apart and make them a “disruptive innovation – one that radically changes the nature of data and what can be done with them” (Kitchin, 2014). However, there is no one characteristic profile that all Big Data fit and they can take multiple forms.
Slide 8: Ethics: Several ethical questions have been raised about the scope of data being generated and retained; such as those concerning privacy, informed consent, and protection from harm.
These questions raise wider issues about what kinds of data should be combined and analysed, and the purposes to which the resulting information should be put.
Slide 9: Inequalities: Challenges of inequality have also been posed:
Whose data traces will be analysed? It is likely that only those who are better off will be represented (as they are more likely to use social media, etc.)
Access and use of open data is unlikely to be equally available to everyone due to existing structural inequalities (Eynon, 2013)
Slide 11: What do Big Data actually tell us? Eynon (2013) argues that Big Data is concerned with capturing and examining patterns, and tells us more about what people actually do than about what they say they do. However, this is not sufficient for all kinds of social science research. We need to understand the meanings of behaviours which cannot be inferred simply from tracking specific patterns.
In order that Big Data are used appropriately, we need to ensure understanding of what kinds of research can or cannot be carried out using them. Big Data should not be seen as [a] “technical fix” for research, but should be used to empower, support and facilitate practice and critical research.
Keynote talk by David Dietrich, EMC Education Services at ICCBDA 2013 : International Conference on Cloud and Big Data Analytics
http://twitter.com/imdaviddietrich
http://infocus.emc.com/author/david_dietrich/
Presentation slide used during the meetup on Artificial Intelligence and Its Ecosystem organized by Developer Session. In the presentation, I highlighted why open data is one of the key parts of AI ecosystem and the situation of Open Data in Nepal.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Here's how big data and the Internet of Things work together: a vast network of sensors (IoT) collect a boatload of information (big data) that is then used to improve services and products in various industries, which in turn generate revenue.
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Take a look at this article with a list of topics for your technology capstone project, we prepared and gathered them together 50 topics , if you need to find more topics just visit site http://www.capstonepaper.net/our-capstone-papers/capstone-technology-paper-writing-services/
A Review of Big data for Social Policy Decision Making Ridi Fe
This presentation is presented on ICSC 2018 at University Club - Universitas Gadjah Mada. it discusses how big data and cloud computing enhance social policy decision making. this research is supported by Microsoft Innovation Center UGM
Data science and visualization lab presentationiHub Research
The Data Science and Visualization Lab! This product is based on a component of research that delves into and innovates on the processes of data science – collection, storage/management, analysis and visualization. You have probably come across one of our amazing info-graphics. What else can you do with data?
Open Data Analytical Model for Human Development Index to Support Government ...Andry Alamsyah
The transparency nature of Open Data is beneficial for citizens to evaluate government work performance. In Indonesia, each government bodies or ministry have their own standard operation procedure on data treatment resulting in incoherent information between agent and likely to miss valuable insight. Therefore, our motivation is to show the advantage of Open Data movement to support unified government decision making. We use dataset from data.go.id which publish official data from each government bodies. The idea is by using those official but limited data, we can find important pattern. The case study is on Human Development Index value prediction and its clustered nature. We explore the data pattern using two important data analytics methods classification and clustering procedure. Data analytics is the collection of activities to reveal unknown data pattern. Specifically, we use Artificial Neural Network classification and K-means clustering. The classification objective is to categorize different level of Human Development Index of cities or region in Indonesia based on Gross Domestic Product, Number of Population in Poverty, Number of Internet User, Number of Labors and Number of Population indicators data. We determined which city belongs to four categories of Human Development stated by UNDP standard. The clustering objective is to find the group characteristics between Human Development Index and Gross Domestic Product.
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Slide 2: Etymology: The etymology of the term ‘Big Data’ can be traced back to the mid-1990s, when it was first used by John Mashey to refer to handling and analysis of massive datasets. However, by 2013, ‘Big Data’ was already being declared obsolescent as a meaningful term by some, as it was too wide ranging and vague in definition (e.g. de Goes, 2013).
Side 6: Vagaries: Kitchin argues that it is velocity and these additional key characteristics that set Big Data apart and make them a “disruptive innovation – one that radically changes the nature of data and what can be done with them” (Kitchin, 2014). However, there is no one characteristic profile that all Big Data fit and they can take multiple forms.
Slide 8: Ethics: Several ethical questions have been raised about the scope of data being generated and retained; such as those concerning privacy, informed consent, and protection from harm.
These questions raise wider issues about what kinds of data should be combined and analysed, and the purposes to which the resulting information should be put.
Slide 9: Inequalities: Challenges of inequality have also been posed:
Whose data traces will be analysed? It is likely that only those who are better off will be represented (as they are more likely to use social media, etc.)
Access and use of open data is unlikely to be equally available to everyone due to existing structural inequalities (Eynon, 2013)
Slide 11: What do Big Data actually tell us? Eynon (2013) argues that Big Data is concerned with capturing and examining patterns, and tells us more about what people actually do than about what they say they do. However, this is not sufficient for all kinds of social science research. We need to understand the meanings of behaviours which cannot be inferred simply from tracking specific patterns.
In order that Big Data are used appropriately, we need to ensure understanding of what kinds of research can or cannot be carried out using them. Big Data should not be seen as [a] “technical fix” for research, but should be used to empower, support and facilitate practice and critical research.
Keynote talk by David Dietrich, EMC Education Services at ICCBDA 2013 : International Conference on Cloud and Big Data Analytics
http://twitter.com/imdaviddietrich
http://infocus.emc.com/author/david_dietrich/
Presentation slide used during the meetup on Artificial Intelligence and Its Ecosystem organized by Developer Session. In the presentation, I highlighted why open data is one of the key parts of AI ecosystem and the situation of Open Data in Nepal.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Here's how big data and the Internet of Things work together: a vast network of sensors (IoT) collect a boatload of information (big data) that is then used to improve services and products in various industries, which in turn generate revenue.
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsArpana Awasthi
BCA Department of JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR. The curriculum is well updated and the subjects included all the latest technologies which are in demand.
JIMS BCA course teaches Python to II semester students and Artificial Intelligence Using Python to Sixth Semester students.
Here is a small article on the Future of Machine Learning, hope you will find it useful.
Machine Learning is a field of Computer science in which computer systems are able to learn from past experiences, examples, environments. With help of various Machine Learning Algorithms, Computers are provided with the ability to sense the data and produce some relevant results.
Machine learning Algorithms provide the technique of predicting the future outcomes or classifying information from the given input to the Machines so that the appropriate decisions can be taken.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Machine learning(ML) is the scientific study of algorithms and statistical models that computer systems used to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “Training Data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in the applications of email filtering, detection of network intruders and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, Machine learning is the study of computer systems that learn from data and experience. It is applied in an incredibly wide variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential customer of machine learning.
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
Gary Hope is currently the Data Platform Technical Specialist within Microsoft South Africa having previously worked for several large organisations including American Express and Siemens Business Solutions.
Slides from talks presented at Mammoth BI in Cape Town on 17 November 2014.
Visit www.mammothbi.co.za for details on the event. Follow @MammothBI on twitter.
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
Machine Learning is an AI method for teaching computers to learn from their mistakes. Machine learning algorithms can “learn” data directly from data without using an equation as a model by employing computational methods.
https://bit.ly/RightContactDataSpecialists
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Outline
1. Introduction
2. Big Data Timeline
3. Machine Learning Fundamentals
4. Machine Learning Use Case Examples
5. Machine Learning System types
6. Data Science Project Workflow
7. Book Recommendations
3. About Me
Data Engineer @ Devoteam
Machine Learning Instructor @
OpenClassrooms
I write about Data Engineering &
Machine Learning on Medium: c-
nemri.medium.com
3x AWS Certified
Soon : GCP and Kubernetes certified
Husband and father of 2 boys
Disability rights advocate
MS. ECE from Georgia Tech (Fall 2015)
ENSEEIHT - Electrical Engineering (2016)
Switched career in 2019
In the past I worked in:
● General Electric (2 years)
● Groupe RENAULT (1 year)
Chouaieb Nemri
Data Engineer with strong interest into AI, Cloud Computing and applied
mathematics. I am passionate about ingesting, processing, exploring
and modeling Data to make it tell compelling stories.
4. 1. Introduction
Each year, USA have:
● 1.6 M new cases of
cancer
● 600k deaths
because of cancer
Breast Cancer
Most common cancer.
Each year USA have:
● 250k new cases of
cancer
● 40k deaths
because of cancer
Cancer tamoxifen citrate
More than 60
drugs have been
approved by FDA.
Before Big Data
era
Initially thought to
have a success
rate of 80%
After Big Data era
Tamoxifen 100% effective on 80% of people
and 0% effective on the other 20%
Life-changing finding
Machine learning and
Data mining techniques
used to identify genetic
markers that tell us about
which treatment will be
best for which patient
How it works ? An opportunity
Thanks to:
● More and more Data
● Computing resources
● Sophisticated algos
Medicine along with many
other sectors has been
reshaped - new concepts
appeared (e.g. personalized
medicine)
Knowing who are the 5% for
which a drug will be
effective is life saving.
5. 2. Big Data Timeline
1991, World
Wide Web
is born
1995,
- Sun releases
Java Platform
- GPS fully
operational
1998,
- Google is
founded
- NoSQL
1999,
term IoT
invented @
MIT
2001
2002,
BT v1.1 spec
is released
2004
2003 2005
2006
2008
The number of devices
connected to the Internet
exceeds the world’s
population
2012
Data Scientist-
sexiest job of
the 21st century
6. 3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach:
1. Manually detect and patterns
in email’s name, body or
domain name.
2. Write hardcoded rules for
every pattern detected.
3. Repeat 1 and 2 until you’re
good to go.
Result:
Complex, non-exhaustive code, that
is hard to maintain.
Machine learning approach
1. Collect Spam Data
2. Train a ML model that learns
automatically unusually fre‐
quent patterns of words in the
spam examples compared to
the ham examples
3. Update training data and re-
train periodically
7. 3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach: Machine learning approach
8. 4. Machine Learning Algorithms and Use Cases
CNN
Convolutional Neural Nets
- Analyzing images of
products on a production line
to automatically classify them
- Detecting tumors in brain
scans
RNN
Recurrent Neural Nets
- Automatically classifying
news articles
- Automatically flagging
offensive comments on
discussion forums
- Chatbots
- Text translation
- TimeSeries Forecasting
Reinforcement Learning
Clustering
- Client segmentation
Recommender Sys
- Netflix
- Amazon
- Dating apps ...
- Robotics
- Intelligent Bots
Anomaly Detection
- Credit card fraud detection
- Cybersecurity
- Log messages anomaly detection
1991 ■ The Internet, or World Wide Web as we know it, is born. The protocol Hypertext Transfer Protocol (HTTP) becomes the standard means for sharing information in this new medium.
1995 ■ Sun releases the Java platform. Java, invented in 1991, has become the second most popular language behind C. It dominates the Web applications space and is the de facto standard for middle‐tier applications. These applications are the source for recording and storing web traffi c. ■ Global Positioning System (GPS) becomes fully operational. GPS was originally developedby DARPA (Defense Advanced Research Projects Agency) for military applications in the early 1970s.
1998 ■ Carlo Strozzi develops an open‐source relational database and calls it NoSQL. Ten years later, a movement to develop NoSQL databases to work with large, unstructured data sets gains momentum. ■ Google is founded by Larry Page and Sergey Brin, who worked for about a year on a Stanford search engine project called BackRub.
1999 ■ Kevin Ashton, cofounder of the Auto‐ID Center at the Massachusetts Institute of Technology (MIT), invents the term “the Internet of Things.”
2001 ■ Wikipedia is launched. The crowd-sourced encyclopedia revolutionized the way people reference information.
2002 ■ Version 1.1 of the Bluetooth specifi cation is released by the Institute of Electrical and Electronics Engineers (IEEE). Bluetooth is a wireless technology standard for the transfer of data over short distances. The advancement of this specifi cation and its adoption lead to a whole host of wearable devices that communicate between the device and another computer. Today nearly every portable device has a Bluetooth receiver
2003 ■ According to studies by IDC and EMC, the amount of data created in 2003 surpasses the amount of data created in all of human history before then. It is estimated that 1.8 zettabytes (ZB) was created in 2011 alone (1.8 ZB is the equivalent of 200 billion high‐defi nition movies, each two hours long, or 47 million years of footage with no bathroom breaks). ■ LinkedIn, the popular social networking website for professionals, launches. In 2013, the site had about 260 million users.
2004 ■ Wikipedia reaches 500,000 articles in February; seven months later it tops 1 million articles. ■ Facebook, the social networking service, is founded by Mark Zuckerberg and others in Cambridge, Massachusetts. In 2013, the site had more than 1.15 billion users.
2005 ■ The Apache Hadoop project is created by Doug Cutting and Mike Caferella. The name for the project came from the toy elephant of Cutting’s young son. The now‐famous yellow elephant becomes a household word just a few years later and a foundational part of almost all big data strategies. ■ The National Science Board recommends that the National Science Foundation (NSF) create a career path for “a suffi cient number of high‐quality data scientists” to manage the growing collection of digital information
2007 ■ Apple releases the iPhone and creates a strong consumer market for smartphones
2008 ■ The number of devices connected to the Internet exceeds the world’s population.
2012 ■ The Obama administration announces the Big Data Research and Development Initiative, consisting of 84 programs in six departments. The NSF publishes “Core Techniques and Technologies for Advancing Big Data Science & Engineering.” ■ IDC and EMC estimate that 2.8 ZB of data will be created in 2012 but that only 3% of what could be usable for big data is tagged and less is analyzed. The report predicts that the digital world will by 2020 hold 40 ZB, 57 times the number of grains of sand on all the beaches in the world. ■ The Harvard Business Review calls the job of data scientist “the sexiest job of the 21st century.”
2013 ■ The democratization of data begins. With smartphones, tablets, and Wi‐Fi, everyone generates data at prodigious rates. More individuals access large volumes of public data and put data to creative use.