SlideShare a Scribd company logo
Stefano Cavuoti
INAF – Capodimonte Astronomical Observatory – Napoli
Oversimplification:
Cloud computing is computing based on the
internet.
Where in the past, people would run applications or programs from
software downloaded on a physical computer or server in their
building, cloud computing allows people access to the same kinds of
applications through the internet.
When you update your Facebook status, you’re
using cloud computing. Checking your bank
balance on your phone? You’re in the cloud
again.
In short, cloud is fast becoming the new
normal. By the end of 2016 it was estimated
that 90% of UK businesses will be using at
least one cloud service.
Why are so many businesses moving to the
cloud? It’s because cloud computing increases
efficiency, helps improve cash flow and offers
several benefits.
Cloud Computing is a general term used to describe a new class
of network based computing that takes place over the Internet:
• basically a further step from Utility Computing;
• a collection/group of integrated and networked hardware,
software and Internet infrastructure (called a platform);
• Using the Internet for communication and transport
provides hardware, software and networking services to
clients.
These platforms hide the complexity and details of the underlying
infrastructure from users and applications by providing very
simple graphical interface or API (Applications Programming
Interface).
The use of the cloud provides a number of
opportunities:
— It enables services to be used without any understanding
of their infrastructure.
— Cloud computing works using economies of scale:
— It potentially lowers the outlay expense for start up companies,
as they would no longer need to buy their own software or
servers.
— Cost would be by on-demand pricing.
— Vendors and Service providers claim costs by establishing an
ongoing revenue stream.
— Data and services are stored remotely but accessible
from “anywhere”.
In parallel there has been backlash against cloud computing:
— Use of cloud computing means dependence on others and that
could possibly limit flexibility and innovation:
— The others are likely become the bigger Internet companies like
Google and IBM, who may monopolise the market.
— Some argue that this use of supercomputers is a return to the
time of mainframe computing that the PC was a reaction
against.
— Security could prove to be a big issue:
— It is still unclear how safe out-sourced data is and when using
these services ownership of data is not always clear.
— There are also issues relating to policy and access:
— If your data is stored abroad whose policy do you adhere to?
— What happens if the remote server goes down?
— How will you then access files?
— There have been cases of users being locked out of accounts and
losing access to data.
— Lower computer costs:
— You do not need a high-powered and high-priced computer to run cloud
computing's web-based applications.
— Since applications run in the cloud, not on the desktop PC, your desktop PC
does not need the processing power or hard disk space demanded by
traditional desktop software.
— When you are using web-based applications, your PC can be less expensive,
with a smaller hard disk, less memory, more efficient processor...
— In fact, your PC in this scenario does not even need a CD or DVD drive, as
no software programs have to be loaded and no document files need to be
saved.
— Reduced software costs:
— Instead of purchasing expensive software applications, you can get most of
what you need for free!
— most cloud computing applications today, such as the Google Docs
suite.
— better than paying for similar commercial software
— which alone may be justification for switching to cloud applications.
— Instant software updates:
— Another advantage to cloud computing is that you are no longer
faced with choosing between obsolete software and high upgrade
costs.
— When the application is web-based, updates happen automatically
— available the next time you log into the cloud.
— When you access a web-based application, you get the latest version
— without needing to pay for or to download an upgrade.
— Improved performance:
— With few large programs hogging your computer's memory, you will
see better performance from your PC.
— Computers in a cloud computing system boot and run faster
because they have fewer programs and processes loaded into
memory…
— Unlimited storage capacity:
— Cloud computing offers virtually limitless storage.
— Your computer's current 1 Tbyte hard drive is small compared
to the hundreds of Pbytes available in the cloud.
— Increased data reliability:
— Unlike desktop computing, in which if a hard disk crashes
will destroy all your valuable data, a computer crashing in the
cloud should not affect the storage of your data.
— if your personal computer crashes, all your data is still out there in
the cloud, still accessible
— In a world where few individual desktop PC users back up
their data on a regular basis, cloud computing is a data-safe
computing platform!
— Universal document access:
— That is not a problem with cloud computing, because you do not take
your documents with you.
— Instead, they stay in the cloud, and you can access them whenever
you have a computer and an Internet connection
— Documents are instantly available from wherever you are
— Latest version availability:
— When you edit a document at home, that edited version is what you
see when you access the document at work.
— The cloud always hosts the latest version of your documents
— as long as you are connected, you are not in danger of having an
outdated version
— Improved document format compatibility.
— You do not have to worry about the documents you create on your
machine being compatible with other users' applications or OSes
— There are potentially no format incompatibilities when everyone is
sharing documents and applications into a cloud.
— Easier group collaboration:
— Sharing documents leads directly to better collaboration.
— Many users do this as it is an important advantage of
cloud computing
— multiple users can collaborate easily on documents and projects
— Device independence.
— You are no longer tethered to a single computer or
network.
— Changes to computers, applications and documents
follow you through the cloud.
— Move to a portable device, and your applications and
documents are still available.
— Requires a constant Internet connection:
— Cloud computing is impossible if you cannot connect to
the Internet.
— Since you use the Internet to connect to both your
applications and documents, if you do not have an
Internet connection you cannot access anything, even
your own documents.
— A dead Internet connection means no work and in areas
where Internet connections are few or inherently
unreliable, this could be a deal-breaker.
— Does not work well with low-speed connections:
— Similarly, a low-speed Internet connection, such as that
found with dial-up services, makes cloud computing
painful at best and often impossible.
— Web-based applications require a lot of bandwidth to
download, as do large documents.
— Features might be limited:
— This situation is bound to change, but today many web-
based applications simply are not as full-featured as
their desktop-based applications.
— For example, you can do a lot more with Microsoft PowerPoint
than with Google Presentation's web-based offering
— Can be slow:
— Even with a fast connection, web-based applications can
sometimes be slower than accessing a similar software
program on your desktop PC.
— Everything about the program, from the interface to the
current document, has to be sent back and forth from
your computer to the computers in the cloud.
— If the cloud servers happen to be backed up at that
moment, or if the Internet is having a slow day, you
would not get the instantaneous access you might
expect from desktop applications.
— Stored data might not be secure:
— With cloud computing, all your data are stored on the
cloud.
— The questions is How secure is the cloud?
— Can unauthorised users gain access to your confidential
data?
— Stored data can be lost:
— Theoretically, data stored in the cloud are safe, replicated
across multiple machines.
— But on the off chance that your data goes missing, you have
no physical or local backup.
— Put simply, relying on the cloud puts you at risk if the cloud lets you
down.
— HPC Systems:
— Not clear that you can run compute-intensive HPC applications
that use MPI/OpenMP!
— Scheduling is important with this type of application
— as you want all the VM to be co-located to minimize communication
latency!
— General Concerns:
— Each cloud systems uses different protocols and different APIs
— may not be possible to run applications between cloud based systems
— Amazon has created its own DB system (not SQL 92), and
workflow system (many popular workflow systems out there)
— so your normal applications will have to be adapted to execute on these
platforms.
Big Data is like teenage sex:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it…
Dan Ariely
You take the Blue Pill,
The story ends. You wake up in your bed and believe whatever you want to believe.
You take the Red Pill,
You stay in Wonderland and I show You how deep the rabbit hole goes
I'm only offering You the TRUTH... Nothing more.
SKA – first light planned 2020 –
will produce al least 1.5 PB/day
(100 PB/day are foreseeable)
Great! But it is just a number...
What does 1.5 PB mean???
SKA WILL ALSO FILL ABOUT
1.000.000.000 AMAZON
KINDLE PER DAY
The largest library in the world is
the Library of Congress,
Washington, D.C., USA with
ONLY 30.000.000 books…
US Census Bureau (December
2010) estimates for 2020 is 7.7
billion of person…
So to SEE each day the amount of SKA data, each
person in the world should read about 100.000
books per day…
ARE YOU READY FOR THIS???
SKA WILL ALSO FILL ABOUT
1.000.000.000 AMAZON
KINDLE PER DAY
The largest library in the world is
the Library of Congress,
Washington, D.C., USA with
ONLY 30.000.000 books…
US Census Bureau (December
2010) estimates for 2020 is 7.7
billion of person…
So to SEE each day the amount of SKA data, each
person in the world should read about 100.000
books per day…
ARE YOU READY FOR THIS???
AND THIS IS JUST ONE SURVEY!!!
I've seen things you people
wouldn't believe. Attack
ships on fire off the shoulder of
Orion. I've watched c-beams
glitter in the dark near the
Tannhäuser Gate. All those
... moments will be lost in
time, like tears...in rain.
Time to die…
What is big data?
Big Data is
any thing
which is
crash Excel.
Small Data is
when is fit in RAM.
Big Data is when is
crash because is
not fit in RAM.
Or, in other words, Big Data is data
in volumes too great to process by
traditional methods.
— Volume
— data volumes are becoming unmanageable
— Variety
— data complexity is growing
— more types of data captured than previously
— Velocity
— some data is arriving so rapidly that it must
either be processed instantly, or lost
— this is a whole subfield called “stream
processing”
— Big quantities of data are acquired everywhere.
— It is now a big issue in all aspects of life: science, business,
healthcare, gov, social networks, national security, media,
etc.
How much data are there in the world?
Ø From the beginning of recorded time until 2003,
we created 5 billion gigabytes (exabytes) of data.
Ø In 2011 the same amount was created every two days.
Ø In 2013, the same amount is created every 10 minutes.
• Computing power doubles every 18 months (Moore’s Law) ...
• 100x improvement in 10 years
• The amount of data doubles every year (or faster!) ...
• 1000x in 10 years, and 1,000,000x in 20 yrs.
• I/O bandwidth increases ~10% / year
• <3x improvement in 10 years.
— Price/performance of disks is improving faster
than the computing (Moore’s law): a factor of ~
100 over 10 years!
— Disks are now already cheaper than paper
— Network bandwith used to grow even faster, but
no longer does
— And most telcos are bancrupt …
— Sneakernet is faster than any network
— Disks make data preservation easier as the
storage technology evolves
— Can you still read your 10 (5?) year old tapes?
— Do not move data over the network: bring the
computation to data!
— The Beowulf paradigm: Datawulf clusters, smart disks …
— The Grid paradigm (done right): move only the questions
and answers, and the flow control
— You will learn to use databases!
— Everybody needs efficient db techniques, DM
(searches, trends & correlations,
anomaly/outlier detection,
clustering/classification, summarization,
visualization, etc.)
— Clustering
— Association learning
— Parameter estimation
— Recommendation engines
— Classification
— Similarity matching
— Neural networks
— Bayesian networks
— Genetic algorithms
— Etc.
Graphic provided by Professor S. G. Djorgovski, Caltech
• Clustering – examine
the data and find the
data clusters (clouds),
without considering
what the items are =
Characterization !
• Classification – for
each new data item,
try to place it within a
known class (i.e., a
known category or
cluster) = Classify !
• Outlier Detection –
identify those data
items that don’t fit into
the known classes or
clusters = Surprise !
— Supervised
— we have training data with correct answers
— use training data to prepare the algorithm
— then apply it to data without a known answer
— Unsupervised
— no training data
— throw data into the algorithm, hope it makes
some kind of sense out of the data
— Reduction
— predicting a variable from data
— Classification
— assigning records to predefined groups
— Clustering
— splitting records into groups based on similarity
— Association learning
— seeing what often appears together with what
— Data is usually noisy in some way
— imprecise input values
— hidden/latent input values
— Inductive bias
— basically, the shape of the algorithm we choose:
— may not fit the data at all
— may induce underfitting or overfitting
— Machine learning without inductive bias is not
possible
— Using an algorithm that cannot capture the full
complexity of the data
— Tuning the algorithm so carefully it starts
matching the noise in the training data
http://dilbert.com/strips/comic/2012-09-05/
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
— Job opportunities are sky-rocketing
— Extremely high demand for Data Science skills
— Demand will continue to increase
— Old: “100 applicants per job”. Future: “100 jobs per applicant”
— Job opportunities are sky-rocketing
— Extremely high demand for Data Science skills
— Demand will continue to increase
— Old: “100 applicants per job”. Future: “100 jobs per applicant”
McKinsey Report (2011**) :
• Big Data is the new “gold rush” , the “new oil”
• 1.5 million skilled data scientist shortage within 3 years
• ** http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
— Job opportunities are sky-rocketing
— Extremely high demand for Data Science skills
— Demand will continue to increase
— Old: “100 applicants per job”. New: “100 jobs per applicant”
McKinsey Report (2011**) :
• Big Data is the new “gold rush” , the “new oil”
• 1.5 million skilled data scientist shortage within 5 years
• ** http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
• Hardly anyone knows this stuff
• It’s a big field, with lots and lots of theory
• And it’s all maths, so it’s tricky to learn
Nostrils
Bill
Frontal
shield
Eye
Ear
Fore feetHind feet
Tail
Science
Math
Statistics
Visualization
Machine Learning
Data MiningProgramming
Hacking
http://www.slideshare.net/Hadoop_Summit/scaling-big-data-mining-infrastructure-twitter-experience/12
Well, in conclusion… we have not yet (we’ll never do) concluded,
in reality: we just started…
• Big data are already here for some use cases and just around
the corner for many other cases
THE CORE IS:
For the Red Pill consumers: YES
Big Data are changing our way to do science and/or business.
A new era of analysis has started.
For the Blue Pill consumers:
Don’t worry, tomorrow you forget everything, you just have a
little déjà vu…
N-N-N-NO TIME, NO TIME, NO TIME!
HELLO, GOOD BYE,
I AM LATE, I AM LATE….
JUST TIME FOR A FEW QUESTIONS!

More Related Content

What's hot

Cloud Computing, an online storage house
Cloud Computing, an online storage houseCloud Computing, an online storage house
Cloud Computing, an online storage house
Manoj Khetan
 
Project Report on Cloud Storage
Project Report on Cloud StorageProject Report on Cloud Storage
Project Report on Cloud Storage
RachitSinghal17
 
How Your Business Can Take Advantage Of Cloud Computing
How Your Business Can Take Advantage Of Cloud ComputingHow Your Business Can Take Advantage Of Cloud Computing
How Your Business Can Take Advantage Of Cloud Computing
Andy Harjanto
 
Boosting Team Productivity By Getting Them Addicted to POT
Boosting Team Productivity By Getting Them Addicted to POTBoosting Team Productivity By Getting Them Addicted to POT
Boosting Team Productivity By Getting Them Addicted to POT
Andy Harjanto
 
Working in the Cloud: An Overview
Working in the Cloud: An OverviewWorking in the Cloud: An Overview
Working in the Cloud: An Overview
Jose Briones
 
Intro Cloud Computing
Intro Cloud ComputingIntro Cloud Computing
Intro Cloud Computing
DSK Chakravarthy
 
Online productivity tools - SILS20090
Online productivity tools - SILS20090Online productivity tools - SILS20090
Online productivity tools - SILS20090
is20090
 
Cloudy with a chance of genealogy
Cloudy with a chance of genealogyCloudy with a chance of genealogy
Cloudy with a chance of genealogy
Dick Eastman
 
The Revolution Will Not Be Centralized
The Revolution Will Not Be CentralizedThe Revolution Will Not Be Centralized
The Revolution Will Not Be Centralized
Edward Faulkner
 
International journal of computer science and innovation vol 2015-n2-paper2
International journal of computer science and innovation  vol 2015-n2-paper2International journal of computer science and innovation  vol 2015-n2-paper2
International journal of computer science and innovation vol 2015-n2-paper2
sophiabelthome
 
Good Cloud Bad Cloud NG security summit june 2011
Good Cloud Bad Cloud NG security summit june 2011Good Cloud Bad Cloud NG security summit june 2011
Good Cloud Bad Cloud NG security summit june 2011
graywilliams
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Islam Shaheen
 
Network media presentation
Network media presentationNetwork media presentation
Network media presentationssatchell
 
Network media presentation
Network media presentationNetwork media presentation
Network media presentationssatchell
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016Joel Kline
 
What is cloud computing?
What is cloud computing?What is cloud computing?
What is cloud computing?
Florizel Media
 
Doing Less for Fun and Profit (by switching to the cloud)
Doing Less for Fun and Profit (by switching to the cloud)Doing Less for Fun and Profit (by switching to the cloud)
Doing Less for Fun and Profit (by switching to the cloud)
Luke Chavers
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Datta Dharanikota
 
Cloud computing presentation
Cloud computing   presentationCloud computing   presentation
Cloud computing presentation
William Mann
 

What's hot (20)

Cloud Computing, an online storage house
Cloud Computing, an online storage houseCloud Computing, an online storage house
Cloud Computing, an online storage house
 
Project Report on Cloud Storage
Project Report on Cloud StorageProject Report on Cloud Storage
Project Report on Cloud Storage
 
How Your Business Can Take Advantage Of Cloud Computing
How Your Business Can Take Advantage Of Cloud ComputingHow Your Business Can Take Advantage Of Cloud Computing
How Your Business Can Take Advantage Of Cloud Computing
 
Boosting Team Productivity By Getting Them Addicted to POT
Boosting Team Productivity By Getting Them Addicted to POTBoosting Team Productivity By Getting Them Addicted to POT
Boosting Team Productivity By Getting Them Addicted to POT
 
Working in the Cloud: An Overview
Working in the Cloud: An OverviewWorking in the Cloud: An Overview
Working in the Cloud: An Overview
 
Intro Cloud Computing
Intro Cloud ComputingIntro Cloud Computing
Intro Cloud Computing
 
Online productivity tools - SILS20090
Online productivity tools - SILS20090Online productivity tools - SILS20090
Online productivity tools - SILS20090
 
Cloudy with a chance of genealogy
Cloudy with a chance of genealogyCloudy with a chance of genealogy
Cloudy with a chance of genealogy
 
The Revolution Will Not Be Centralized
The Revolution Will Not Be CentralizedThe Revolution Will Not Be Centralized
The Revolution Will Not Be Centralized
 
International journal of computer science and innovation vol 2015-n2-paper2
International journal of computer science and innovation  vol 2015-n2-paper2International journal of computer science and innovation  vol 2015-n2-paper2
International journal of computer science and innovation vol 2015-n2-paper2
 
Good Cloud Bad Cloud NG security summit june 2011
Good Cloud Bad Cloud NG security summit june 2011Good Cloud Bad Cloud NG security summit june 2011
Good Cloud Bad Cloud NG security summit june 2011
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Network media presentation
Network media presentationNetwork media presentation
Network media presentation
 
Network media presentation
Network media presentationNetwork media presentation
Network media presentation
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016
 
What is cloud computing?
What is cloud computing?What is cloud computing?
What is cloud computing?
 
Doing Less for Fun and Profit (by switching to the cloud)
Doing Less for Fun and Profit (by switching to the cloud)Doing Less for Fun and Profit (by switching to the cloud)
Doing Less for Fun and Profit (by switching to the cloud)
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computing presentation
Cloud computing   presentationCloud computing   presentation
Cloud computing presentation
 

Viewers also liked

06 ashish mahabal bse2
06 ashish mahabal bse206 ashish mahabal bse2
06 ashish mahabal bse2
Marco Quartulli
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2
Marco Quartulli
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
Marco Quartulli
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
Marco Quartulli
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
Marco Quartulli
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016
Marco Quartulli
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimization
Marco Quartulli
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction
Marco Quartulli
 
06 ashish mahabal bse1
06 ashish mahabal bse106 ashish mahabal bse1
06 ashish mahabal bse1
Marco Quartulli
 
06 ashish mahabal bse3
06 ashish mahabal bse306 ashish mahabal bse3
06 ashish mahabal bse3
Marco Quartulli
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
Sankarapu Anjaneyulu
 
Viet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud ComputingViet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud Computing
Vietnam Open Infrastructure User Group
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 

Viewers also liked (17)

06 ashish mahabal bse2
06 ashish mahabal bse206 ashish mahabal bse2
06 ashish mahabal bse2
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimization
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction
 
06 ashish mahabal bse1
06 ashish mahabal bse106 ashish mahabal bse1
06 ashish mahabal bse1
 
06 ashish mahabal bse3
06 ashish mahabal bse306 ashish mahabal bse3
06 ashish mahabal bse3
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Viet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud ComputingViet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud Computing
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar to 04 bigdata and_cloud_computing

History and Evolution of Cloud computing (Safaricom cloud)
History and Evolution of Cloud computing (Safaricom cloud)History and Evolution of Cloud computing (Safaricom cloud)
History and Evolution of Cloud computing (Safaricom cloud)
Ben Wakhungu
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
John Joseph
 
Learn Cloud Computing.pdf
Learn Cloud Computing.pdfLearn Cloud Computing.pdf
Learn Cloud Computing.pdf
DevOps University
 
Fog computing
Fog computingFog computing
Fog computing
Rishabh Kumar ☁️
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Vipin Subhash
 
Cloud computing presentation
Cloud computing presentationCloud computing presentation
Cloud computing presentationPriyanka Sharma
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Samit Kumar Kapat
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Madhav Reddy Chintapalli
 
My Presentation on Cloud Computing
My Presentation on Cloud ComputingMy Presentation on Cloud Computing
My Presentation on Cloud Computing
Pravin Sable
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
sadakpramodh
 
Cloud computing & its Basic
Cloud computing & its BasicCloud computing & its Basic
Cloud computing & its Basic
Patrik Soleman
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
sadak pramodh
 
Cloud concepts-and-technologies
Cloud concepts-and-technologiesCloud concepts-and-technologies
Cloud concepts-and-technologies
Parag Patil
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
gangal
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptx
vishal choudhary
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Mayur Verma
 
cloud computing is the delivery of computing services
cloud computing is the delivery of computing servicescloud computing is the delivery of computing services
cloud computing is the delivery of computing services
abidhassan225
 
basic concept of Cloud computing and its architecture
basic concept of Cloud computing  and its architecturebasic concept of Cloud computing  and its architecture
basic concept of Cloud computing and its architecture
Mohammad Ilyas Malik
 
Cloud storage or computing & its working
Cloud storage or computing & its workingCloud storage or computing & its working
Cloud storage or computing & its working
piyush mishra
 

Similar to 04 bigdata and_cloud_computing (20)

History and Evolution of Cloud computing (Safaricom cloud)
History and Evolution of Cloud computing (Safaricom cloud)History and Evolution of Cloud computing (Safaricom cloud)
History and Evolution of Cloud computing (Safaricom cloud)
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Learn Cloud Computing.pdf
Learn Cloud Computing.pdfLearn Cloud Computing.pdf
Learn Cloud Computing.pdf
 
Fog computing
Fog computingFog computing
Fog computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing presentation
Cloud computing presentationCloud computing presentation
Cloud computing presentation
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
My Presentation on Cloud Computing
My Presentation on Cloud ComputingMy Presentation on Cloud Computing
My Presentation on Cloud Computing
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
 
Cloud computing & its Basic
Cloud computing & its BasicCloud computing & its Basic
Cloud computing & its Basic
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
 
Cloud concepts-and-technologies
Cloud concepts-and-technologiesCloud concepts-and-technologies
Cloud concepts-and-technologies
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptx
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
cloud computing is the delivery of computing services
cloud computing is the delivery of computing servicescloud computing is the delivery of computing services
cloud computing is the delivery of computing services
 
basic concept of Cloud computing and its architecture
basic concept of Cloud computing  and its architecturebasic concept of Cloud computing  and its architecture
basic concept of Cloud computing and its architecture
 
Cloud storage or computing & its working
Cloud storage or computing & its workingCloud storage or computing & its working
Cloud storage or computing & its working
 

Recently uploaded

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 

Recently uploaded (20)

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 

04 bigdata and_cloud_computing

  • 1. Stefano Cavuoti INAF – Capodimonte Astronomical Observatory – Napoli
  • 2. Oversimplification: Cloud computing is computing based on the internet. Where in the past, people would run applications or programs from software downloaded on a physical computer or server in their building, cloud computing allows people access to the same kinds of applications through the internet.
  • 3. When you update your Facebook status, you’re using cloud computing. Checking your bank balance on your phone? You’re in the cloud again. In short, cloud is fast becoming the new normal. By the end of 2016 it was estimated that 90% of UK businesses will be using at least one cloud service. Why are so many businesses moving to the cloud? It’s because cloud computing increases efficiency, helps improve cash flow and offers several benefits.
  • 4. Cloud Computing is a general term used to describe a new class of network based computing that takes place over the Internet: • basically a further step from Utility Computing; • a collection/group of integrated and networked hardware, software and Internet infrastructure (called a platform); • Using the Internet for communication and transport provides hardware, software and networking services to clients. These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing very simple graphical interface or API (Applications Programming Interface).
  • 5. The use of the cloud provides a number of opportunities: — It enables services to be used without any understanding of their infrastructure. — Cloud computing works using economies of scale: — It potentially lowers the outlay expense for start up companies, as they would no longer need to buy their own software or servers. — Cost would be by on-demand pricing. — Vendors and Service providers claim costs by establishing an ongoing revenue stream. — Data and services are stored remotely but accessible from “anywhere”.
  • 6. In parallel there has been backlash against cloud computing: — Use of cloud computing means dependence on others and that could possibly limit flexibility and innovation: — The others are likely become the bigger Internet companies like Google and IBM, who may monopolise the market. — Some argue that this use of supercomputers is a return to the time of mainframe computing that the PC was a reaction against. — Security could prove to be a big issue: — It is still unclear how safe out-sourced data is and when using these services ownership of data is not always clear. — There are also issues relating to policy and access: — If your data is stored abroad whose policy do you adhere to? — What happens if the remote server goes down? — How will you then access files? — There have been cases of users being locked out of accounts and losing access to data.
  • 7. — Lower computer costs: — You do not need a high-powered and high-priced computer to run cloud computing's web-based applications. — Since applications run in the cloud, not on the desktop PC, your desktop PC does not need the processing power or hard disk space demanded by traditional desktop software. — When you are using web-based applications, your PC can be less expensive, with a smaller hard disk, less memory, more efficient processor... — In fact, your PC in this scenario does not even need a CD or DVD drive, as no software programs have to be loaded and no document files need to be saved. — Reduced software costs: — Instead of purchasing expensive software applications, you can get most of what you need for free! — most cloud computing applications today, such as the Google Docs suite. — better than paying for similar commercial software — which alone may be justification for switching to cloud applications.
  • 8. — Instant software updates: — Another advantage to cloud computing is that you are no longer faced with choosing between obsolete software and high upgrade costs. — When the application is web-based, updates happen automatically — available the next time you log into the cloud. — When you access a web-based application, you get the latest version — without needing to pay for or to download an upgrade. — Improved performance: — With few large programs hogging your computer's memory, you will see better performance from your PC. — Computers in a cloud computing system boot and run faster because they have fewer programs and processes loaded into memory…
  • 9. — Unlimited storage capacity: — Cloud computing offers virtually limitless storage. — Your computer's current 1 Tbyte hard drive is small compared to the hundreds of Pbytes available in the cloud. — Increased data reliability: — Unlike desktop computing, in which if a hard disk crashes will destroy all your valuable data, a computer crashing in the cloud should not affect the storage of your data. — if your personal computer crashes, all your data is still out there in the cloud, still accessible — In a world where few individual desktop PC users back up their data on a regular basis, cloud computing is a data-safe computing platform!
  • 10. — Universal document access: — That is not a problem with cloud computing, because you do not take your documents with you. — Instead, they stay in the cloud, and you can access them whenever you have a computer and an Internet connection — Documents are instantly available from wherever you are — Latest version availability: — When you edit a document at home, that edited version is what you see when you access the document at work. — The cloud always hosts the latest version of your documents — as long as you are connected, you are not in danger of having an outdated version — Improved document format compatibility. — You do not have to worry about the documents you create on your machine being compatible with other users' applications or OSes — There are potentially no format incompatibilities when everyone is sharing documents and applications into a cloud.
  • 11. — Easier group collaboration: — Sharing documents leads directly to better collaboration. — Many users do this as it is an important advantage of cloud computing — multiple users can collaborate easily on documents and projects — Device independence. — You are no longer tethered to a single computer or network. — Changes to computers, applications and documents follow you through the cloud. — Move to a portable device, and your applications and documents are still available.
  • 12. — Requires a constant Internet connection: — Cloud computing is impossible if you cannot connect to the Internet. — Since you use the Internet to connect to both your applications and documents, if you do not have an Internet connection you cannot access anything, even your own documents. — A dead Internet connection means no work and in areas where Internet connections are few or inherently unreliable, this could be a deal-breaker.
  • 13. — Does not work well with low-speed connections: — Similarly, a low-speed Internet connection, such as that found with dial-up services, makes cloud computing painful at best and often impossible. — Web-based applications require a lot of bandwidth to download, as do large documents. — Features might be limited: — This situation is bound to change, but today many web- based applications simply are not as full-featured as their desktop-based applications. — For example, you can do a lot more with Microsoft PowerPoint than with Google Presentation's web-based offering
  • 14. — Can be slow: — Even with a fast connection, web-based applications can sometimes be slower than accessing a similar software program on your desktop PC. — Everything about the program, from the interface to the current document, has to be sent back and forth from your computer to the computers in the cloud. — If the cloud servers happen to be backed up at that moment, or if the Internet is having a slow day, you would not get the instantaneous access you might expect from desktop applications.
  • 15. — Stored data might not be secure: — With cloud computing, all your data are stored on the cloud. — The questions is How secure is the cloud? — Can unauthorised users gain access to your confidential data? — Stored data can be lost: — Theoretically, data stored in the cloud are safe, replicated across multiple machines. — But on the off chance that your data goes missing, you have no physical or local backup. — Put simply, relying on the cloud puts you at risk if the cloud lets you down.
  • 16. — HPC Systems: — Not clear that you can run compute-intensive HPC applications that use MPI/OpenMP! — Scheduling is important with this type of application — as you want all the VM to be co-located to minimize communication latency! — General Concerns: — Each cloud systems uses different protocols and different APIs — may not be possible to run applications between cloud based systems — Amazon has created its own DB system (not SQL 92), and workflow system (many popular workflow systems out there) — so your normal applications will have to be adapted to execute on these platforms.
  • 17.
  • 18.
  • 19. Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it… Dan Ariely
  • 20. You take the Blue Pill, The story ends. You wake up in your bed and believe whatever you want to believe. You take the Red Pill, You stay in Wonderland and I show You how deep the rabbit hole goes I'm only offering You the TRUTH... Nothing more.
  • 21.
  • 22. SKA – first light planned 2020 – will produce al least 1.5 PB/day (100 PB/day are foreseeable) Great! But it is just a number... What does 1.5 PB mean???
  • 23. SKA WILL ALSO FILL ABOUT 1.000.000.000 AMAZON KINDLE PER DAY The largest library in the world is the Library of Congress, Washington, D.C., USA with ONLY 30.000.000 books… US Census Bureau (December 2010) estimates for 2020 is 7.7 billion of person… So to SEE each day the amount of SKA data, each person in the world should read about 100.000 books per day… ARE YOU READY FOR THIS???
  • 24. SKA WILL ALSO FILL ABOUT 1.000.000.000 AMAZON KINDLE PER DAY The largest library in the world is the Library of Congress, Washington, D.C., USA with ONLY 30.000.000 books… US Census Bureau (December 2010) estimates for 2020 is 7.7 billion of person… So to SEE each day the amount of SKA data, each person in the world should read about 100.000 books per day… ARE YOU READY FOR THIS??? AND THIS IS JUST ONE SURVEY!!!
  • 25. I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I've watched c-beams glitter in the dark near the Tannhäuser Gate. All those ... moments will be lost in time, like tears...in rain. Time to die…
  • 26. What is big data? Big Data is any thing which is crash Excel. Small Data is when is fit in RAM. Big Data is when is crash because is not fit in RAM. Or, in other words, Big Data is data in volumes too great to process by traditional methods.
  • 27. — Volume — data volumes are becoming unmanageable — Variety — data complexity is growing — more types of data captured than previously — Velocity — some data is arriving so rapidly that it must either be processed instantly, or lost — this is a whole subfield called “stream processing”
  • 28. — Big quantities of data are acquired everywhere. — It is now a big issue in all aspects of life: science, business, healthcare, gov, social networks, national security, media, etc.
  • 29. How much data are there in the world? Ø From the beginning of recorded time until 2003, we created 5 billion gigabytes (exabytes) of data. Ø In 2011 the same amount was created every two days. Ø In 2013, the same amount is created every 10 minutes.
  • 30. • Computing power doubles every 18 months (Moore’s Law) ... • 100x improvement in 10 years • The amount of data doubles every year (or faster!) ... • 1000x in 10 years, and 1,000,000x in 20 yrs. • I/O bandwidth increases ~10% / year • <3x improvement in 10 years.
  • 31. — Price/performance of disks is improving faster than the computing (Moore’s law): a factor of ~ 100 over 10 years! — Disks are now already cheaper than paper — Network bandwith used to grow even faster, but no longer does — And most telcos are bancrupt … — Sneakernet is faster than any network — Disks make data preservation easier as the storage technology evolves — Can you still read your 10 (5?) year old tapes?
  • 32. — Do not move data over the network: bring the computation to data! — The Beowulf paradigm: Datawulf clusters, smart disks … — The Grid paradigm (done right): move only the questions and answers, and the flow control — You will learn to use databases! — Everybody needs efficient db techniques, DM (searches, trends & correlations, anomaly/outlier detection, clustering/classification, summarization, visualization, etc.)
  • 33. — Clustering — Association learning — Parameter estimation — Recommendation engines — Classification — Similarity matching — Neural networks — Bayesian networks — Genetic algorithms — Etc.
  • 34. Graphic provided by Professor S. G. Djorgovski, Caltech • Clustering – examine the data and find the data clusters (clouds), without considering what the items are = Characterization ! • Classification – for each new data item, try to place it within a known class (i.e., a known category or cluster) = Classify ! • Outlier Detection – identify those data items that don’t fit into the known classes or clusters = Surprise !
  • 35. — Supervised — we have training data with correct answers — use training data to prepare the algorithm — then apply it to data without a known answer — Unsupervised — no training data — throw data into the algorithm, hope it makes some kind of sense out of the data
  • 36. — Reduction — predicting a variable from data — Classification — assigning records to predefined groups — Clustering — splitting records into groups based on similarity — Association learning — seeing what often appears together with what
  • 37. — Data is usually noisy in some way — imprecise input values — hidden/latent input values — Inductive bias — basically, the shape of the algorithm we choose: — may not fit the data at all — may induce underfitting or overfitting — Machine learning without inductive bias is not possible
  • 38. — Using an algorithm that cannot capture the full complexity of the data
  • 39. — Tuning the algorithm so carefully it starts matching the noise in the training data
  • 41. — Job opportunities are sky-rocketing — Extremely high demand for Data Science skills — Demand will continue to increase — Old: “100 applicants per job”. Future: “100 jobs per applicant”
  • 42. — Job opportunities are sky-rocketing — Extremely high demand for Data Science skills — Demand will continue to increase — Old: “100 applicants per job”. Future: “100 jobs per applicant” McKinsey Report (2011**) : • Big Data is the new “gold rush” , the “new oil” • 1.5 million skilled data scientist shortage within 3 years • ** http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
  • 43. — Job opportunities are sky-rocketing — Extremely high demand for Data Science skills — Demand will continue to increase — Old: “100 applicants per job”. New: “100 jobs per applicant” McKinsey Report (2011**) : • Big Data is the new “gold rush” , the “new oil” • 1.5 million skilled data scientist shortage within 5 years • ** http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation • Hardly anyone knows this stuff • It’s a big field, with lots and lots of theory • And it’s all maths, so it’s tricky to learn
  • 44.
  • 48. Well, in conclusion… we have not yet (we’ll never do) concluded, in reality: we just started… • Big data are already here for some use cases and just around the corner for many other cases THE CORE IS: For the Red Pill consumers: YES Big Data are changing our way to do science and/or business. A new era of analysis has started. For the Blue Pill consumers: Don’t worry, tomorrow you forget everything, you just have a little déjà vu…
  • 49. N-N-N-NO TIME, NO TIME, NO TIME! HELLO, GOOD BYE, I AM LATE, I AM LATE…. JUST TIME FOR A FEW QUESTIONS!