SlideShare a Scribd company logo
1 of 32
Big Data ABC @andrefaria
Concepts
Relational 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Key Value Stores 
Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Document Stores 
think about document databases as key-value stores where the value is examinable 
rich query language + indexes 
MongoDB, CouchDB , Terrastore, OrientDB, RavenDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Column Family Stores 
Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row 
and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, 
and columns can be added to any row at any time without having to add it to other rows. 
Cassandra, HBase, Hypertable 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Graph Databases 
Neo4J, Infinite Graph, OrientDB, FlockDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Map Reduce
Sharding 
http://dbshards.com/articles/database-sharding-configuration/
Document-oriented system 
Large Data Sets 
Records similar to JSON 
Automatic sharding and MapReduce 
Queries are written in JavaScript
Document-oriented system 
JavaScript Interface 
Multi-version concurrency control approach 
Client side needs to handle clashes on writes 
No good built-in method for horizontal scalability 
(but there are external solutions like BigCouch, Lounge, and Pillow)
Originally an internal Facebook project 
Keyspaces and column families 
Similar the data model used by BigTable 
Data is sharded and balanced automatically
Keeps the entire Database in RAM 
Its values can be complex data structures
BIG TABLE 
Structure: tables, row keys, column families, column 
names, timestamps, and cell values 
Designed to handle very large data loads by running on 
big clusters of commodity hardware 
Uses GFS as its underlying storage
Open source clone of Big Table 
Same structure of Big Table 
Uses HDFS instead of GFS
Another open source BigTable clone written in C++ 
Focus in High Performance
DynamoDB 
Key Value System 
Large Distributed Clusters 
Versioning
AWS S3 
HTTP 
Blobs
Inspired by AWS Dynamo DB 
OpenSource and Commercial Versions 
Key Value System 
Large Distributed Clusters 
Queries in ErLang or JavaScript 
Consistent hashing and a gossip protocol to avoid centralized index server
Takes care of running your code across a cluster of machines. 
- chunking up the input data 
- sending it to each machine 
- running your code on each chunk 
- checking that the code ran 
- passing any results next stage 
- sorting between stages 
- sending each chunk of that sorted data to the right machine 
- writing debugging information on each job’s progress
With Hive, you can program 
Hadoop jobs using a SQL-like 
language HiveQL.
Apache Pig 
A procedural data 
processing language 
designed for Hadoop 
Provides a set of functions 
that help with common 
data processing problems
PigPen is map-reduce for Clojure, or distributed Clojure. 
It compiles to Apache Pig, but you don't need to know 
much about Pig to use it.
Hadoop for Logs 
The Flume project is designed to make the data 
gathering process easy and scalable, by running agents 
on the source machines that pass the data updates to 
collectors, which then aggregate them into large chunks 
that can be efficiently written as HDFS files.
The R project is both a 
specialized language and a 
toolkit of modules aimed at 
anyone working with 
statistics.
Lucene is a Java library 
that handles indexing and 
searching large collections 
of documents, and Solr is 
an application that uses 
the library to build a 
search engine server.
Mahout is an open source 
framework that can run 
common machine learning 
algorithms on massive 
datasets. 
The framework makes it 
easy to use analysis 
techniques to implement 
features such as “People 
who bought this also 
bought” recommendation 
engine on your own site.
ZooKeeper 
Coordinates work 
and configuration of 
different Clusters
Serialization 
As you pass data between systems and you need to store it in files at some points
JSON 
BSON (Binary JSON) 
Apache Thrift (predefine structure) 
Apache Avro (predefine structure)
blog.andrefaria.com 
datavisionary.net 
andrefaria.com @andrefaria

More Related Content

What's hot

What's hot (20)

Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Cloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deploymentCloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deployment
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Mongo db
Mongo dbMongo db
Mongo db
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 

Viewers also liked

Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegie
mpedroso2011
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshare
W5 Coaching
 

Viewers also liked (20)

Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine Luiza
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
 
Innovation - Think outside the box
Innovation - Think outside the boxInnovation - Think outside the box
Innovation - Think outside the box
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
ABC Algorithm.
ABC Algorithm.ABC Algorithm.
ABC Algorithm.
 
Real options
Real optionsReal options
Real options
 
Objetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaObjetividade: A Virtude Esquecida
Objetividade: A Virtude Esquecida
 
Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hive
 
Lições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoLições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio Augusto
 
The Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetThe Secret To Success Is Your Mindset
The Secret To Success Is Your Mindset
 
Followership
FollowershipFollowership
Followership
 
Pensando Rápido e Devagar
Pensando Rápido e DevagarPensando Rápido e Devagar
Pensando Rápido e Devagar
 
Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegie
 
Capital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroCapital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo Financeiro
 
Os 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesOs 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazes
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshare
 
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoas
 

Similar to The ABC of Big Data

Similar to The ABC of Big Data (20)

In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 

More from André Faria Gomes

More from André Faria Gomes (20)

Meetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceMeetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta Performance
 
Protagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroProtagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuro
 
A Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalA Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação Digital
 
Além da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAlém da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff Wow
 
Modern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyModern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, Cabify
 
Breaking the monolith
Breaking the monolithBreaking the monolith
Breaking the monolith
 
Agilidade - APAS
Agilidade - APASAgilidade - APAS
Agilidade - APAS
 
Principles and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioPrinciples and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray Dalio
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101
 
Boas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansBoas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista Wegmans
 
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
 
Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model
 
Palestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadePalestra na Uninove sobre Agilidade
Palestra na Uninove sobre Agilidade
 
What happened to Google Reader?
What happened to Google Reader?What happened to Google Reader?
What happened to Google Reader?
 
Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0
 
Lições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeLições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidade
 
Bematech IFRS
Bematech IFRSBematech IFRS
Bematech IFRS
 
Tips for SaaS Sales Team
Tips for SaaS Sales TeamTips for SaaS Sales Team
Tips for SaaS Sales Team
 
Atendimento Campeão
Atendimento CampeãoAtendimento Campeão
Atendimento Campeão
 
Big Ideias about Spotify Culture
Big Ideias about Spotify CultureBig Ideias about Spotify Culture
Big Ideias about Spotify Culture
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

The ABC of Big Data

  • 1. Big Data ABC @andrefaria
  • 2.
  • 5. Key Value Stores Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 6. Document Stores think about document databases as key-value stores where the value is examinable rich query language + indexes MongoDB, CouchDB , Terrastore, OrientDB, RavenDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 7. Column Family Stores Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it to other rows. Cassandra, HBase, Hypertable http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 8. Graph Databases Neo4J, Infinite Graph, OrientDB, FlockDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 11. Document-oriented system Large Data Sets Records similar to JSON Automatic sharding and MapReduce Queries are written in JavaScript
  • 12. Document-oriented system JavaScript Interface Multi-version concurrency control approach Client side needs to handle clashes on writes No good built-in method for horizontal scalability (but there are external solutions like BigCouch, Lounge, and Pillow)
  • 13. Originally an internal Facebook project Keyspaces and column families Similar the data model used by BigTable Data is sharded and balanced automatically
  • 14. Keeps the entire Database in RAM Its values can be complex data structures
  • 15. BIG TABLE Structure: tables, row keys, column families, column names, timestamps, and cell values Designed to handle very large data loads by running on big clusters of commodity hardware Uses GFS as its underlying storage
  • 16. Open source clone of Big Table Same structure of Big Table Uses HDFS instead of GFS
  • 17. Another open source BigTable clone written in C++ Focus in High Performance
  • 18. DynamoDB Key Value System Large Distributed Clusters Versioning
  • 19. AWS S3 HTTP Blobs
  • 20. Inspired by AWS Dynamo DB OpenSource and Commercial Versions Key Value System Large Distributed Clusters Queries in ErLang or JavaScript Consistent hashing and a gossip protocol to avoid centralized index server
  • 21. Takes care of running your code across a cluster of machines. - chunking up the input data - sending it to each machine - running your code on each chunk - checking that the code ran - passing any results next stage - sorting between stages - sending each chunk of that sorted data to the right machine - writing debugging information on each job’s progress
  • 22. With Hive, you can program Hadoop jobs using a SQL-like language HiveQL.
  • 23. Apache Pig A procedural data processing language designed for Hadoop Provides a set of functions that help with common data processing problems
  • 24. PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
  • 25. Hadoop for Logs The Flume project is designed to make the data gathering process easy and scalable, by running agents on the source machines that pass the data updates to collectors, which then aggregate them into large chunks that can be efficiently written as HDFS files.
  • 26. The R project is both a specialized language and a toolkit of modules aimed at anyone working with statistics.
  • 27. Lucene is a Java library that handles indexing and searching large collections of documents, and Solr is an application that uses the library to build a search engine server.
  • 28. Mahout is an open source framework that can run common machine learning algorithms on massive datasets. The framework makes it easy to use analysis techniques to implement features such as “People who bought this also bought” recommendation engine on your own site.
  • 29. ZooKeeper Coordinates work and configuration of different Clusters
  • 30. Serialization As you pass data between systems and you need to store it in files at some points
  • 31. JSON BSON (Binary JSON) Apache Thrift (predefine structure) Apache Avro (predefine structure)