SlideShare a Scribd company logo
Big Data ABC @andrefaria
Concepts
Relational 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Key Value Stores 
Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Document Stores 
think about document databases as key-value stores where the value is examinable 
rich query language + indexes 
MongoDB, CouchDB , Terrastore, OrientDB, RavenDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Column Family Stores 
Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row 
and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, 
and columns can be added to any row at any time without having to add it to other rows. 
Cassandra, HBase, Hypertable 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Graph Databases 
Neo4J, Infinite Graph, OrientDB, FlockDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Map Reduce
Sharding 
http://dbshards.com/articles/database-sharding-configuration/
Document-oriented system 
Large Data Sets 
Records similar to JSON 
Automatic sharding and MapReduce 
Queries are written in JavaScript
Document-oriented system 
JavaScript Interface 
Multi-version concurrency control approach 
Client side needs to handle clashes on writes 
No good built-in method for horizontal scalability 
(but there are external solutions like BigCouch, Lounge, and Pillow)
Originally an internal Facebook project 
Keyspaces and column families 
Similar the data model used by BigTable 
Data is sharded and balanced automatically
Keeps the entire Database in RAM 
Its values can be complex data structures
BIG TABLE 
Structure: tables, row keys, column families, column 
names, timestamps, and cell values 
Designed to handle very large data loads by running on 
big clusters of commodity hardware 
Uses GFS as its underlying storage
Open source clone of Big Table 
Same structure of Big Table 
Uses HDFS instead of GFS
Another open source BigTable clone written in C++ 
Focus in High Performance
DynamoDB 
Key Value System 
Large Distributed Clusters 
Versioning
AWS S3 
HTTP 
Blobs
Inspired by AWS Dynamo DB 
OpenSource and Commercial Versions 
Key Value System 
Large Distributed Clusters 
Queries in ErLang or JavaScript 
Consistent hashing and a gossip protocol to avoid centralized index server
Takes care of running your code across a cluster of machines. 
- chunking up the input data 
- sending it to each machine 
- running your code on each chunk 
- checking that the code ran 
- passing any results next stage 
- sorting between stages 
- sending each chunk of that sorted data to the right machine 
- writing debugging information on each job’s progress
With Hive, you can program 
Hadoop jobs using a SQL-like 
language HiveQL.
Apache Pig 
A procedural data 
processing language 
designed for Hadoop 
Provides a set of functions 
that help with common 
data processing problems
PigPen is map-reduce for Clojure, or distributed Clojure. 
It compiles to Apache Pig, but you don't need to know 
much about Pig to use it.
Hadoop for Logs 
The Flume project is designed to make the data 
gathering process easy and scalable, by running agents 
on the source machines that pass the data updates to 
collectors, which then aggregate them into large chunks 
that can be efficiently written as HDFS files.
The R project is both a 
specialized language and a 
toolkit of modules aimed at 
anyone working with 
statistics.
Lucene is a Java library 
that handles indexing and 
searching large collections 
of documents, and Solr is 
an application that uses 
the library to build a 
search engine server.
Mahout is an open source 
framework that can run 
common machine learning 
algorithms on massive 
datasets. 
The framework makes it 
easy to use analysis 
techniques to implement 
features such as “People 
who bought this also 
bought” recommendation 
engine on your own site.
ZooKeeper 
Coordinates work 
and configuration of 
different Clusters
Serialization 
As you pass data between systems and you need to store it in files at some points
JSON 
BSON (Binary JSON) 
Apache Thrift (predefine structure) 
Apache Avro (predefine structure)
blog.andrefaria.com 
datavisionary.net 
andrefaria.com @andrefaria

More Related Content

What's hot

Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
Cloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deploymentCloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deployment
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Trieu Nguyen
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
葵慶 李
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
Cloudera, Inc.
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
Sufi Nawaz
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
AyeeshaParveen
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
Santiago Coffey
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
Jie-Han Chen
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
Kumaresan Manickavelu
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
Mongo db
Mongo dbMongo db
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
SoftwareMill
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
Chandan Rajah
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
faizrashid1995
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 

What's hot (20)

Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Cloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deploymentCloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deployment
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Mongo db
Mongo dbMongo db
Mongo db
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 

Viewers also liked

Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...
Felipe Pereira
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine Luiza
Nelson Forte
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
CRISIL Limited
 
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Celso Silvati
 
Innovation - Think outside the box
Innovation - Think outside the boxInnovation - Think outside the box
Innovation - Think outside the box
André Faria Gomes
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
André Faria Gomes
 
ABC Algorithm.
ABC Algorithm.ABC Algorithm.
ABC Algorithm.
N Vinayak
 
Objetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaObjetividade: A Virtude Esquecida
Objetividade: A Virtude Esquecida
André Faria Gomes
 
Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)
André Faria Gomes
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hive
Flavio Fonte, PMP, ITIL
 
Lições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoLições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio Augusto
André Faria Gomes
 
The Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetThe Secret To Success Is Your Mindset
The Secret To Success Is Your Mindset
Justin Bryant
 
Followership
FollowershipFollowership
Followership
André Faria Gomes
 
Pensando Rápido e Devagar
Pensando Rápido e DevagarPensando Rápido e Devagar
Pensando Rápido e Devagar
André Faria Gomes
 
Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegiempedroso2011
 
Capital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroCapital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo Financeiro
André Faria Gomes
 
Os 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesOs 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazes
André Faria Gomes
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshareW5 Coaching
 
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias
 

Viewers also liked (20)

Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine Luiza
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
 
Innovation - Think outside the box
Innovation - Think outside the boxInnovation - Think outside the box
Innovation - Think outside the box
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
ABC Algorithm.
ABC Algorithm.ABC Algorithm.
ABC Algorithm.
 
Real options
Real optionsReal options
Real options
 
Objetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaObjetividade: A Virtude Esquecida
Objetividade: A Virtude Esquecida
 
Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hive
 
Lições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoLições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio Augusto
 
The Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetThe Secret To Success Is Your Mindset
The Secret To Success Is Your Mindset
 
Followership
FollowershipFollowership
Followership
 
Pensando Rápido e Devagar
Pensando Rápido e DevagarPensando Rápido e Devagar
Pensando Rápido e Devagar
 
Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegie
 
Capital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroCapital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo Financeiro
 
Os 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesOs 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazes
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshare
 
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoas
 

Similar to The ABC of Big Data

In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
No sql databases
No sql databasesNo sql databases
No sql databases
Walaa Hamdy Assy
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
Mohammed Fazuluddin
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
Mostafa
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
Mostafa
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
Yahoo Developer Network
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
gvernik
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
gvernik
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
gagravarr
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Chirag Ahuja
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
Jonathan Bloom
 

Similar to The ABC of Big Data (20)

In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 

More from André Faria Gomes

Meetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceMeetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta Performance
André Faria Gomes
 
Protagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroProtagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuro
André Faria Gomes
 
A Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalA Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação Digital
André Faria Gomes
 
Além da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAlém da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff Wow
André Faria Gomes
 
Modern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyModern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, Cabify
André Faria Gomes
 
Breaking the monolith
Breaking the monolithBreaking the monolith
Breaking the monolith
André Faria Gomes
 
Agilidade - APAS
Agilidade - APASAgilidade - APAS
Agilidade - APAS
André Faria Gomes
 
Principles and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioPrinciples and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray Dalio
André Faria Gomes
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101
André Faria Gomes
 
Boas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansBoas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista Wegmans
André Faria Gomes
 
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
André Faria Gomes
 
Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model
André Faria Gomes
 
Palestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadePalestra na Uninove sobre Agilidade
Palestra na Uninove sobre Agilidade
André Faria Gomes
 
What happened to Google Reader?
What happened to Google Reader?What happened to Google Reader?
What happened to Google Reader?
André Faria Gomes
 
Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0
André Faria Gomes
 
Lições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeLições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidade
André Faria Gomes
 
Bematech IFRS
Bematech IFRSBematech IFRS
Bematech IFRS
André Faria Gomes
 
Tips for SaaS Sales Team
Tips for SaaS Sales TeamTips for SaaS Sales Team
Tips for SaaS Sales Team
André Faria Gomes
 
Atendimento Campeão
Atendimento CampeãoAtendimento Campeão
Atendimento Campeão
André Faria Gomes
 
Big Ideias about Spotify Culture
Big Ideias about Spotify CultureBig Ideias about Spotify Culture
Big Ideias about Spotify Culture
André Faria Gomes
 

More from André Faria Gomes (20)

Meetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceMeetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta Performance
 
Protagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroProtagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuro
 
A Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalA Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação Digital
 
Além da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAlém da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff Wow
 
Modern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyModern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, Cabify
 
Breaking the monolith
Breaking the monolithBreaking the monolith
Breaking the monolith
 
Agilidade - APAS
Agilidade - APASAgilidade - APAS
Agilidade - APAS
 
Principles and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioPrinciples and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray Dalio
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101
 
Boas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansBoas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista Wegmans
 
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
 
Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model
 
Palestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadePalestra na Uninove sobre Agilidade
Palestra na Uninove sobre Agilidade
 
What happened to Google Reader?
What happened to Google Reader?What happened to Google Reader?
What happened to Google Reader?
 
Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0
 
Lições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeLições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidade
 
Bematech IFRS
Bematech IFRSBematech IFRS
Bematech IFRS
 
Tips for SaaS Sales Team
Tips for SaaS Sales TeamTips for SaaS Sales Team
Tips for SaaS Sales Team
 
Atendimento Campeão
Atendimento CampeãoAtendimento Campeão
Atendimento Campeão
 
Big Ideias about Spotify Culture
Big Ideias about Spotify CultureBig Ideias about Spotify Culture
Big Ideias about Spotify Culture
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

The ABC of Big Data

  • 1. Big Data ABC @andrefaria
  • 2.
  • 5. Key Value Stores Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 6. Document Stores think about document databases as key-value stores where the value is examinable rich query language + indexes MongoDB, CouchDB , Terrastore, OrientDB, RavenDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 7. Column Family Stores Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it to other rows. Cassandra, HBase, Hypertable http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 8. Graph Databases Neo4J, Infinite Graph, OrientDB, FlockDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 11. Document-oriented system Large Data Sets Records similar to JSON Automatic sharding and MapReduce Queries are written in JavaScript
  • 12. Document-oriented system JavaScript Interface Multi-version concurrency control approach Client side needs to handle clashes on writes No good built-in method for horizontal scalability (but there are external solutions like BigCouch, Lounge, and Pillow)
  • 13. Originally an internal Facebook project Keyspaces and column families Similar the data model used by BigTable Data is sharded and balanced automatically
  • 14. Keeps the entire Database in RAM Its values can be complex data structures
  • 15. BIG TABLE Structure: tables, row keys, column families, column names, timestamps, and cell values Designed to handle very large data loads by running on big clusters of commodity hardware Uses GFS as its underlying storage
  • 16. Open source clone of Big Table Same structure of Big Table Uses HDFS instead of GFS
  • 17. Another open source BigTable clone written in C++ Focus in High Performance
  • 18. DynamoDB Key Value System Large Distributed Clusters Versioning
  • 19. AWS S3 HTTP Blobs
  • 20. Inspired by AWS Dynamo DB OpenSource and Commercial Versions Key Value System Large Distributed Clusters Queries in ErLang or JavaScript Consistent hashing and a gossip protocol to avoid centralized index server
  • 21. Takes care of running your code across a cluster of machines. - chunking up the input data - sending it to each machine - running your code on each chunk - checking that the code ran - passing any results next stage - sorting between stages - sending each chunk of that sorted data to the right machine - writing debugging information on each job’s progress
  • 22. With Hive, you can program Hadoop jobs using a SQL-like language HiveQL.
  • 23. Apache Pig A procedural data processing language designed for Hadoop Provides a set of functions that help with common data processing problems
  • 24. PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
  • 25. Hadoop for Logs The Flume project is designed to make the data gathering process easy and scalable, by running agents on the source machines that pass the data updates to collectors, which then aggregate them into large chunks that can be efficiently written as HDFS files.
  • 26. The R project is both a specialized language and a toolkit of modules aimed at anyone working with statistics.
  • 27. Lucene is a Java library that handles indexing and searching large collections of documents, and Solr is an application that uses the library to build a search engine server.
  • 28. Mahout is an open source framework that can run common machine learning algorithms on massive datasets. The framework makes it easy to use analysis techniques to implement features such as “People who bought this also bought” recommendation engine on your own site.
  • 29. ZooKeeper Coordinates work and configuration of different Clusters
  • 30. Serialization As you pass data between systems and you need to store it in files at some points
  • 31. JSON BSON (Binary JSON) Apache Thrift (predefine structure) Apache Avro (predefine structure)