SlideShare a Scribd company logo
1 of 22
Processamento Intensivo de 
Dados 
Intensive Data Processing 
(Big Data) 
Nelson F. F. 
Ebecken 
NTT/COPPE/UFRJ 
Your Big Data Is Worthless if You Don’t Bring It Into the Real World 
http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the real--world/
Big Data 
Big Data refers to data that is too big to fit on a 
single server, too unstructured to fit into a 
row-and-column database, or too 
continuously flowing to fit into a static data 
warehouse (Thomas H. Davenport)
Big Data and traditional analytics 
Type of data 
Volume of Data 
Big Data 
Unstructured formats 
100 terabytes to petabytes 
Traditional analytics 
Formated in rows and 
columns 
Tens of terabytes or less 
Flow of Data 
Analysis methods 
Constant flow of data 
Machine Learning 
Static pool of data 
Hypothesis-based 
Primary purpose Data-based products Internal decision support 
and services
A menu of big data possibilities 
Style of data Source of data Industry affected Function affected 
Large volume Online Financial services Marketing 
Unstructured Video Health care Supply chain 
Continuous flow Sensor Manufacturing Human resources 
Multiple formats Genomic Travel/transport Finance
Terminology for using and analyzing data 
Term Time frame 
Decision support 1970-1985 
Executive support 1980-1990 
Online analytical 
processing OLAP 
1990-2000 
Business intelligence 1989-2005 
Analytics 2005-2010 
Big Data 2010-present 
Specific meaning 
Use of data analysis to 
support decision making 
Focus on data analysis for 
decisions by senior 
executives 
Software for analysing 
multidimensional data 
tables 
Tools to support data-driven 
decisions, with 
emphasis on reporting 
Focus on ststistical and 
mathematical analysis for 
decisions 
Focus on very large, 
unstructured, fast moving 
data
How important is Big Data to You and Your Organization ? 
 Has your management team considered some of the new types of data 
that may affect your business and industry, both now and in the next 
several years ? 
 Have you discussed the term big data and wether it’s a good description of 
what your organization is doing with data and analytics ? 
 Are you beggining to change your decision-making processes toward a 
more continuos approach driven by the continuos availability of data ? 
 Has your organization adopted faster and more agile approaches to 
analyzing and acting on important data and analysis ? 
 Are you beggining to focus more on external information about business 
and makets enviroments ? 
 Have you made a big bet on big data ?
Big data is going to reshape a lot of different 
businesses and industries 
 Every industry that moves things 
 Every industry that sells to consumers 
 Every industry that emplys machinery 
 Every industry that sells or uses content 
 Every industry that provides service 
 Every industry that has physical facilities 
 Every industry that involves money
Responsability locus for big data projects 
Cost savings 
Faster decisions 
Better decisions 
Product/service innovation 
Discovery 
IT innovation group 
Business unit or function 
analytics group 
Business unit or function 
analytics group 
R&D or product 
development group 
Production 
IT architecture and 
operations 
Business unit or function 
executive 
Business unit or function 
executive 
Product development or 
product management
Overview of technologies for big data 
Technology 
Hadoop 
Definition 
Open source software for processing 
big data across multiple parallel servers 
MapReduce 
Scripting languages 
Machine learning 
Visual analytics 
Natural language processing NLP 
In-memory analytics 
The architectural framework on which 
Hadoop is based 
Programming languages that work well 
with big data (Python, Pig, Hive...) 
Algorithms for rapidly finding the model 
that best fits a data set 
Display of analytical results in visual or 
graphic formats 
Algorithms for analyzing text, frequencies, 
meanings,... 
Processing big data in computer memory 
for greater speed
MapReduce 
MapReduce is a programming model for expressing 
distributed computations on massive amounts of data and 
an execution framework for large-scale data processing on 
clusters of commodity servers. 
 It was originally developed by Google 
 In 2003, Google's distributed file system, called GFS 
In 2004, Google published the paper that introduced 
MapReduce 
MapReduce has since enjoyed widespread adoption via 
an open-source implementation called Hadoop, whose 
development was led by Yahoo (an Apache project).
Programming Model 
Input & Output: each a set of key/value pairs 
Programmer specifies two functions: 
Processes input key/value pair 
Produces set of intermediate pairs 
'map (in_key, in_value) -> list(out_key, 
intermediate_value)I 
• Produces a set of merged output values (usually just one) 
'reduce (out_key, list(intermediate_value)) -> list(out_value)I
Map-Reduce 
. Parallel programming for large masses of data 
Map/Combine/Partition Shuffle Sort/Reduce 
key/val key/val 
key/val key/val 
key/val key/val 
Reduce output 
Reduce output 
Reduce output 
input Map 
input Map 
input Map 
14
Why learn models in MapReduce? 
 High data throughput 
Stream about 100 Tb per hour using 500 mappers 
 Framework provides fault tolerance 
Monitors mappers and reducers and re-starts tasks on 
other machines should one of the machines fail 
 Excels in counting patterns over data records 
 Built on relatively cheap, commodity hardware 
No special purpose computing hardware 
 Large volumes of data are being increasingly 
stored on Grid clusters running MapReduce 
Especially in the internet domain
Why learn models in MapReduce? 
• Learning can become limited by computation 
time and not data volume 
With large enough data and number of machines 
Reduces the need to down-sample data 
More accurate parameter estimates compared to 
learning on a single machine for the same amount of time
Learning models in MapReduce 
 A primer for learning models in MapReduce (MR) 
Illustrate techniques for distributing the learning algorithm in a 
MapReduce framework 
Focus on the mapper and reducer computations 
 Data parallel algorithms are most appropriate for 
MapReduce implementations 
 Not necessarily the most optimal implementation for a 
specific algorithm 
Other specialized non-MapReduce implementations exist for 
some algorithms, which may be better 
 MR may not be the appropriate framework for exact 
solutions of non data parallel/sequential algorithms 
Approximate solutions using MapReduce may be good enough
Types of learning in MapReduce 
• Three common types of learning models using 
MapReduce framework 
1. Parallel training of multiple models 
– Train either in mappers or reducers 
2. Ensemble training methods 
– Train multiple models and combine them 
3. Distributed learning algorithms 
– Learn using both mappers and reducers 
Use the Grid as a 
large cluster 
of independent 
machines 
(with fault 
tolerance)
Parallel training of multiple models 
 Train multiple models simultaneously using a learning 
algorithm that can be learnt in memory 
 Useful when individual models are trained using a 
subset, filtered or modification of raw data 
 Can train 1000`s of models simultaneously 
 Essentially, treat Grid as a large cluster of machines 
– Leverage fault tolerance of Hadoop 
 Train 1 model in each reducer 
– Map: 
 Input: All data 
 Filters subset of data relevant for each model training 
 Output: <model_index, subset of data for training this model> 
– Reduce 
 Train model on data corresponding to that model_index
Apache Mahout 
Scalable to large data sets. Our core algorithms for clustering, classification and 
collaborative filtering are implemented on top of scalable, distributed systems. 
However, contributions that run on a single machine are welcome as well. 
Scalable to support your business case. Mahout is distributed under a 
commercially friendly Apache Software license. 
Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse 
community to facilitate discussions not only on the project itself but also on potential 
use cases. Come to the mailing lists to find out more. 
Currently Mahout supports mainly three use cases: Recommendation mining takes 
users' behavior and from that tries to find items users might like. Clustering takes 
e.g. text documents and groups them into groups of topically related documents. 
Classification learns from existing categorized documents what documents of a 
specific category look like and is able to assign unlabelled documents to the 
(hopefully) correct category. 
25 April 2014 - Goodbye MapReduce 
The Mahout community decided to move its codebase onto modern data processing systems that offer a richer 
programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new 
MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce 
algorithms in the codebase and maintain them. 
We are building our future implementations on top of a DSL for linear algebraic operations which has been 
developed over the last months. Programs written in this DSL are automatically optimized and executed in 
parallel on Apache Spark. 
Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into 
Mahout. 
Apache Spark™ is a fast and general engine for large-scale data processing. 
H2O is the open source in memory solution from 0xdata for predictive analytics on big data.
Matrix 
Methods 
Slides with bit.ly/10SIe1A 
Code github.com/dgleich/matrix-Hadoop hadoop-tutorial 
DAVID F. 
GLEICH ASSISTANT PROFESSOR 
COMPUTER SCIENCE 
PURDUE UNIVERSITY 
David Gleich á Purdue bit.ly/10SIe1A 
1
20
ACM KDD 2014 
24-27/08 
New environments: Microsoft Azure ML Studio, Google 
Prediction API,… 
2 Research Sessions + Industry & Government 
Statistical Techniques for Big Data 
Scaling-up Methods for Big Data 
Topic Modeling
Big data & machine learning 
This is a huge field, growing very fast 
Many algorithms and techniques: 
can be seen as a giant toolbox with wide-ranging applications 
Ranging from the very simple to the extremely sophisticated 
Difficult to see the big picture 
Huge range of applications 
Math skills are crucial

More Related Content

What's hot

Machine Learning Hadoop
Machine Learning HadoopMachine Learning Hadoop
Machine Learning HadoopAletheLabs
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using HadoopSrikanth VNV
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureDataWorks Summit
 
DW Appliance
DW ApplianceDW Appliance
DW ApplianceShankar R
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworksIJDKP
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 

What's hot (20)

Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Machine Learning Hadoop
Machine Learning HadoopMachine Learning Hadoop
Machine Learning Hadoop
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the Future
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 

Viewers also liked

27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...
27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...
27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...Rio Info
 
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra GattiInovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra GattiRio Info
 
Rio Info 2010 - Encontro PSVs IT - Vanda Scartezini
Rio Info 2010 - Encontro PSVs IT - Vanda ScarteziniRio Info 2010 - Encontro PSVs IT - Vanda Scartezini
Rio Info 2010 - Encontro PSVs IT - Vanda ScarteziniRio Info
 
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 - Karen Gallant
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 -  Karen Gallantdia 27/09/2011 - 14h às 17h30 - Talentos 2.0 -  Karen Gallant
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 - Karen GallantRio Info
 
Xalles presentation for Rio Info Portugal - may 18 2010
Xalles presentation for Rio Info Portugal - may 18 2010Xalles presentation for Rio Info Portugal - may 18 2010
Xalles presentation for Rio Info Portugal - may 18 2010Rio Info
 
watten reserach why its sale decline
watten reserach why its sale declinewatten reserach why its sale decline
watten reserach why its sale declineWahab Yunus
 
Rio Info 2010 - Encontro PSVs IT - Djalma Petit
Rio Info 2010 - Encontro PSVs IT - Djalma PetitRio Info 2010 - Encontro PSVs IT - Djalma Petit
Rio Info 2010 - Encontro PSVs IT - Djalma PetitRio Info
 
Rio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info
 
Big data: tendências e oportunidades - Palestrante: Cezar Taurion
Big data: tendências e oportunidades - Palestrante: Cezar TaurionBig data: tendências e oportunidades - Palestrante: Cezar Taurion
Big data: tendências e oportunidades - Palestrante: Cezar TaurionRio Info
 
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...Rio Info
 
Wateen final (research method)
Wateen final (research method)Wateen final (research method)
Wateen final (research method)Wahab Yunus
 
Why We Need Friends 97 2003
Why We Need Friends 97 2003Why We Need Friends 97 2003
Why We Need Friends 97 2003Wahab Yunus
 

Viewers also liked (16)

27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...
27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...
27/09/2011 - 14h às 18h - encontro de negócios com software livre - Arlindo M...
 
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra GattiInovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
 
Preposiciones
PreposicionesPreposiciones
Preposiciones
 
Chapter 1 Pt2
Chapter 1 Pt2Chapter 1 Pt2
Chapter 1 Pt2
 
Rio Info 2010 - Encontro PSVs IT - Vanda Scartezini
Rio Info 2010 - Encontro PSVs IT - Vanda ScarteziniRio Info 2010 - Encontro PSVs IT - Vanda Scartezini
Rio Info 2010 - Encontro PSVs IT - Vanda Scartezini
 
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 - Karen Gallant
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 -  Karen Gallantdia 27/09/2011 - 14h às 17h30 - Talentos 2.0 -  Karen Gallant
dia 27/09/2011 - 14h às 17h30 - Talentos 2.0 - Karen Gallant
 
Xalles presentation for Rio Info Portugal - may 18 2010
Xalles presentation for Rio Info Portugal - may 18 2010Xalles presentation for Rio Info Portugal - may 18 2010
Xalles presentation for Rio Info Portugal - may 18 2010
 
watten reserach why its sale decline
watten reserach why its sale declinewatten reserach why its sale decline
watten reserach why its sale decline
 
Rio Info 2010 - Encontro PSVs IT - Djalma Petit
Rio Info 2010 - Encontro PSVs IT - Djalma PetitRio Info 2010 - Encontro PSVs IT - Djalma Petit
Rio Info 2010 - Encontro PSVs IT - Djalma Petit
 
Rio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der WerfRio Info 2009 - Europeana - Bram van der Werf
Rio Info 2009 - Europeana - Bram van der Werf
 
Big data: tendências e oportunidades - Palestrante: Cezar Taurion
Big data: tendências e oportunidades - Palestrante: Cezar TaurionBig data: tendências e oportunidades - Palestrante: Cezar Taurion
Big data: tendências e oportunidades - Palestrante: Cezar Taurion
 
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...
RioInfo 2010: Seminário de Tecnologia - Mesa 1 - Integração e Convergência Ma...
 
Wateen final (research method)
Wateen final (research method)Wateen final (research method)
Wateen final (research method)
 
Why We Need Friends 97 2003
Why We Need Friends 97 2003Why We Need Friends 97 2003
Why We Need Friends 97 2003
 
My career goals
My career goalsMy career goals
My career goals
 
Olpers final
Olpers finalOlpers final
Olpers final
 

Similar to Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Nelson Favilla

Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 

Similar to Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Nelson Favilla (20)

Big Data
Big DataBig Data
Big Data
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 

More from Rio Info

Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...Rio Info
 
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina DissatRio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina DissatRio Info
 
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio SouzaRio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio SouzaRio Info
 
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo NavarroRio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo NavarroRio Info
 
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...Rio Info
 
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...Rio Info
 
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie WitteRio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie WitteRio Info
 
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martinsRio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martinsRio Info
 
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...Rio Info
 
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  BiomobRio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - BiomobRio Info
 
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando RibasRio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando RibasRio Info
 
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...Rio Info
 
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio OliveiraRio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio OliveiraRio Info
 
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério GonçalvesRio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério GonçalvesRio Info
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...Rio Info
 
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto GasteiRio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto GasteiRio Info
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - PloogRio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - PloogRio Info
 
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus DratovskyRio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus DratovskyRio Info
 
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz SantosRio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz SantosRio Info
 
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo FynnRio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo FynnRio Info
 

More from Rio Info (20)

Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
 
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina DissatRio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
 
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio SouzaRio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
 
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo NavarroRio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
 
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
 
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
 
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie WitteRio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
 
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martinsRio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
 
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
 
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  BiomobRio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - Biomob
 
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando RibasRio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
 
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
 
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio OliveiraRio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
 
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério GonçalvesRio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
 
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto GasteiRio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - PloogRio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
 
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus DratovskyRio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
 
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz SantosRio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
 
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo FynnRio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Nelson Favilla

  • 1. Processamento Intensivo de Dados Intensive Data Processing (Big Data) Nelson F. F. Ebecken NTT/COPPE/UFRJ Your Big Data Is Worthless if You Don’t Bring It Into the Real World http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the real--world/
  • 2. Big Data Big Data refers to data that is too big to fit on a single server, too unstructured to fit into a row-and-column database, or too continuously flowing to fit into a static data warehouse (Thomas H. Davenport)
  • 3. Big Data and traditional analytics Type of data Volume of Data Big Data Unstructured formats 100 terabytes to petabytes Traditional analytics Formated in rows and columns Tens of terabytes or less Flow of Data Analysis methods Constant flow of data Machine Learning Static pool of data Hypothesis-based Primary purpose Data-based products Internal decision support and services
  • 4. A menu of big data possibilities Style of data Source of data Industry affected Function affected Large volume Online Financial services Marketing Unstructured Video Health care Supply chain Continuous flow Sensor Manufacturing Human resources Multiple formats Genomic Travel/transport Finance
  • 5. Terminology for using and analyzing data Term Time frame Decision support 1970-1985 Executive support 1980-1990 Online analytical processing OLAP 1990-2000 Business intelligence 1989-2005 Analytics 2005-2010 Big Data 2010-present Specific meaning Use of data analysis to support decision making Focus on data analysis for decisions by senior executives Software for analysing multidimensional data tables Tools to support data-driven decisions, with emphasis on reporting Focus on ststistical and mathematical analysis for decisions Focus on very large, unstructured, fast moving data
  • 6. How important is Big Data to You and Your Organization ?  Has your management team considered some of the new types of data that may affect your business and industry, both now and in the next several years ?  Have you discussed the term big data and wether it’s a good description of what your organization is doing with data and analytics ?  Are you beggining to change your decision-making processes toward a more continuos approach driven by the continuos availability of data ?  Has your organization adopted faster and more agile approaches to analyzing and acting on important data and analysis ?  Are you beggining to focus more on external information about business and makets enviroments ?  Have you made a big bet on big data ?
  • 7. Big data is going to reshape a lot of different businesses and industries  Every industry that moves things  Every industry that sells to consumers  Every industry that emplys machinery  Every industry that sells or uses content  Every industry that provides service  Every industry that has physical facilities  Every industry that involves money
  • 8. Responsability locus for big data projects Cost savings Faster decisions Better decisions Product/service innovation Discovery IT innovation group Business unit or function analytics group Business unit or function analytics group R&D or product development group Production IT architecture and operations Business unit or function executive Business unit or function executive Product development or product management
  • 9. Overview of technologies for big data Technology Hadoop Definition Open source software for processing big data across multiple parallel servers MapReduce Scripting languages Machine learning Visual analytics Natural language processing NLP In-memory analytics The architectural framework on which Hadoop is based Programming languages that work well with big data (Python, Pig, Hive...) Algorithms for rapidly finding the model that best fits a data set Display of analytical results in visual or graphic formats Algorithms for analyzing text, frequencies, meanings,... Processing big data in computer memory for greater speed
  • 10. MapReduce MapReduce is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers.  It was originally developed by Google  In 2003, Google's distributed file system, called GFS In 2004, Google published the paper that introduced MapReduce MapReduce has since enjoyed widespread adoption via an open-source implementation called Hadoop, whose development was led by Yahoo (an Apache project).
  • 11. Programming Model Input & Output: each a set of key/value pairs Programmer specifies two functions: Processes input key/value pair Produces set of intermediate pairs 'map (in_key, in_value) -> list(out_key, intermediate_value)I • Produces a set of merged output values (usually just one) 'reduce (out_key, list(intermediate_value)) -> list(out_value)I
  • 12. Map-Reduce . Parallel programming for large masses of data Map/Combine/Partition Shuffle Sort/Reduce key/val key/val key/val key/val key/val key/val Reduce output Reduce output Reduce output input Map input Map input Map 14
  • 13. Why learn models in MapReduce?  High data throughput Stream about 100 Tb per hour using 500 mappers  Framework provides fault tolerance Monitors mappers and reducers and re-starts tasks on other machines should one of the machines fail  Excels in counting patterns over data records  Built on relatively cheap, commodity hardware No special purpose computing hardware  Large volumes of data are being increasingly stored on Grid clusters running MapReduce Especially in the internet domain
  • 14. Why learn models in MapReduce? • Learning can become limited by computation time and not data volume With large enough data and number of machines Reduces the need to down-sample data More accurate parameter estimates compared to learning on a single machine for the same amount of time
  • 15. Learning models in MapReduce  A primer for learning models in MapReduce (MR) Illustrate techniques for distributing the learning algorithm in a MapReduce framework Focus on the mapper and reducer computations  Data parallel algorithms are most appropriate for MapReduce implementations  Not necessarily the most optimal implementation for a specific algorithm Other specialized non-MapReduce implementations exist for some algorithms, which may be better  MR may not be the appropriate framework for exact solutions of non data parallel/sequential algorithms Approximate solutions using MapReduce may be good enough
  • 16. Types of learning in MapReduce • Three common types of learning models using MapReduce framework 1. Parallel training of multiple models – Train either in mappers or reducers 2. Ensemble training methods – Train multiple models and combine them 3. Distributed learning algorithms – Learn using both mappers and reducers Use the Grid as a large cluster of independent machines (with fault tolerance)
  • 17. Parallel training of multiple models  Train multiple models simultaneously using a learning algorithm that can be learnt in memory  Useful when individual models are trained using a subset, filtered or modification of raw data  Can train 1000`s of models simultaneously  Essentially, treat Grid as a large cluster of machines – Leverage fault tolerance of Hadoop  Train 1 model in each reducer – Map:  Input: All data  Filters subset of data relevant for each model training  Output: <model_index, subset of data for training this model> – Reduce  Train model on data corresponding to that model_index
  • 18. Apache Mahout Scalable to large data sets. Our core algorithms for clustering, classification and collaborative filtering are implemented on top of scalable, distributed systems. However, contributions that run on a single machine are welcome as well. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly three use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. 25 April 2014 - Goodbye MapReduce The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them. We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark. Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout. Apache Spark™ is a fast and general engine for large-scale data processing. H2O is the open source in memory solution from 0xdata for predictive analytics on big data.
  • 19. Matrix Methods Slides with bit.ly/10SIe1A Code github.com/dgleich/matrix-Hadoop hadoop-tutorial DAVID F. GLEICH ASSISTANT PROFESSOR COMPUTER SCIENCE PURDUE UNIVERSITY David Gleich á Purdue bit.ly/10SIe1A 1
  • 20. 20
  • 21. ACM KDD 2014 24-27/08 New environments: Microsoft Azure ML Studio, Google Prediction API,… 2 Research Sessions + Industry & Government Statistical Techniques for Big Data Scaling-up Methods for Big Data Topic Modeling
  • 22. Big data & machine learning This is a huge field, growing very fast Many algorithms and techniques: can be seen as a giant toolbox with wide-ranging applications Ranging from the very simple to the extremely sophisticated Difficult to see the big picture Huge range of applications Math skills are crucial