SlideShare a Scribd company logo
Building Google-in-a-box:!
using Apache SolrCloud and Bigtop to index your bigdata
Who’s this guy?
Roman Shaposhnik!
@rhatr or rvs@apache.org
•  Sr. Manager at Pivotal Inc. building a team of ASF contributors
•  ASF junkie
•  VP of Apache Incubator, former VP of Apache Bigtop
•  Hadoop/Sqoop/Giraph committer
•  contributor across the Hadoop ecosystem)
•  Used to be root@Cloudera
•  Used to be a PHB at Yahoo!
•  Used to be a UNIX hacker at Sun microsystems
•  First time author: “Giraph in action”
What’s this all about?
This is NOT this kind of talk
This is this kind of a talk:
What are we building?
WWW analytics platform
HDFS
HBase
MapReduce
Nutch
WWW
Solr
Cloud
Lily
HBase
Indexer
Hive
Hue
DataSci
Replication
Morphlines
Pig
Zookeeper
Google papers
•  GFS (Google FS) == HDFS
•  MapReduce == MapReduce
•  Bigtable == HBase
•  Sawzall == Pig/Hive
•  F1 == HAWQ/Impala
Storage design requirements
•  Low-level storage layer: KISS
•  commodity hardware
•  massively scalable
•  highly available
•  minimalistic set of APIs (non-POSIX)
•  Application specific storage layer
•  leverages LLSL
•  Fast r/w random access (vs. immutable streaming)
•  Scan operations
Design patterns
•  HDFS is the “data lake”
•  Great simplification of storage administration (no-SANAS)
•  “Stateless” distributed applications persistence layer
•  Applications are “stateless” compositions of various services
•  Can be instantiated anywhere (think YARN)
•  Can restart serving up the state from HDFS
•  Are coordinated via Zookeeper
Application design: SolrCloud
HDFS
Zookeeper
Solr
svc …
Solr
svcI am alive
Who Am I?
What do I do?
Application design: SolrCloud
HDFS
Zookeeper
Solr
svc
Peer is dead
What do I do?
Application design: SolrCloud
HDFS
Zookeeper
Solr
svc …
Solr
svcI am alive
Who Am I?
What do I do?
replication kicks in
How do we build something like this?
The bill of materials
•  HDFS
•  Zookeeper
•  HBase
•  Nutch
•  Lily HBase indexer
•  SolrCloud
•  Morphlines (part of Project Kite)
•  Hue
•  Hive/Pig/…
How about?
$ for comp in hadoop hbase zookeeper … ; do
wget http://dist.apache.org/$comp
tar xzvf $comp.tar.gz
cd $comp ; mvn/ant/make install
scp …
ssh …
done
How about?
$ for comp in hadoop hbase zookeeper … ; do
wget http://dist.apache.org/$comp
tar xzvf $comp.tar.gz
cd $comp ; mvn/ant/make install
scp …
ssh …
done
We’ve seen this before!
GNU Software Linux kernel
Apache Bigtop!
HBase, Solr.. Hadoop
Lets get down to business
Still remember this?
HDFS
HBase
Solr
Cloud
Lily
HBase
Indexer
Replication
Morphlines
Zookeeper
Hue
DataSci
HBase: row-key design
com.cnn.www/a.html <html>...
content:	

CNN CNN.com
anchor:a.com	

 anchor:b.com
Indexing: schema design
•  Bad news: no more “schema on write”
•  Good news: you can change it on the fly
•  Lets start with the simplest one:!
!
<field name=”id" type=”string" indexed="true" stored="true” required=“true”/>
<field name=”text" type="text_general" indexed="true" stored="true"/>
<field name=“url” type=”string" indexed="true" stored="true”/>
Deployment
•  Single node pseudo distributed configuration
•  Puppet-driven deployment
•  Bigtop comes with modules
•  You provide your own cluster topology in cluster.pp
Deploying the ‘data lake’
•  Zookeeper
•  3-5 members of the ensemble

# vi /etc/zookeeper/conf/zoo.cfg
# service zookeeper-server init
# service zookeeper-server start!

•  HDFS
•  tons of configurations to consider: HA, NFS, etc.
•  see above, plus: /usr/lib/hadoop/libexec/init-hdfs.sh
HBase asynchronous indexing
•  leveraging WAL for indexing
•  can achieve infinite scalability of the indexer
•  doesn’t slow down HBase (unlike co-processors)
•  /etc/hbase/conf/hbase-site.xml:
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
Different clouds
HBase
Region
Server
HBase
Region
Server
Lily
Indexer
Node
Lily
Indexer
Node
Solr
Node
Solr
Node
HBase “cloud” Lily Indexer “cloud” SolrCloud
home of Morphline ETL
… … …
replication Solr docs
Lily HBase indexer
•  Pretends to be a region server on the receiving end
•  Gets records
•  Pipes them through the Morphline ETL
•  Feeds the result to Solr
•  All operations are managed via individual indexers
Creating an indexer

$ hbase-indexer add-indexer !
--name web_crawl !
--indexer-conf ./indexer.xml !
--connection-param solr.zk=localhost/solr !
--connection-param solr.collection=web_crawl !
--zookeeper localhost:2181
indexer.xml

<indexer table="web_crawl"
mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

<param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/>
<!-- <param name="morphlineId" value="morphline1"/> à
</indexer>
Morphlines
•  Part of Project Kite (look for it on GitHub)
•  A very flexible ETL library (not just for HBase)
•  “UNIX pipes” for bigdata
•  Designed for NRT processing
•  Record-oriented processing driven by HOCON definition
•  Require a “pump” (most of the time)
•  Have built-in syncs (e.g. loadSolr)
•  Essentially a push-based data flow engine
Different clouds
extractHBaseCells
convertHTML
WAL entries
N records
xquery
logInfo
M records
P records
Solr docs
Morphline spec
morphlines : [
{ id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]

commands : [
{ extractHBaseCells {…} }
{ convertHTML {charset : UTF-8} }
{ xquery {…} }
{ logInfo { format : "output record: {}", args : ["@{}"] } }
]
}
]
extractHBaseCells
{
extractHBaseCells {
mappings : [
{
inputColumn : "content:*"
outputField : "_attachment_body"
type : "byte[]"
source : value
}
]
}
}
xquery
{ xquery {
fragments : [ {
fragmentPath : "/"
queryString : """
<fieldsToIndex>
<webpage> {for $tk in //text() return concat($tk, ' ')} </webpage>
</fieldsToIndex>
"""
}
]
}
}
SolrCloud
•  Serves up lucene indices from HDFS
•  A webapp running on bigtop-tomcat
•  gets configured via /etc/default/solr!
SOLR_PORT=8983
SOLR_ADMIN_PORT=8984
SOLR_LOG=/var/log/solr
SOLR_ZK_ENSEMBLE=localhost:2181/solr
SOLR_HDFS_HOME=hdfs://localhost:8020/solr
SOLR_HDFS_CONFIG=/etc/hadoop/conf!
!
Collections and instancedirs
•  All of these objects reside in Zookeeper
•  An unfortunate trend we already saw with Lily indexers
•  Collection
•  a distributed set of lucene indices
•  an object defined by Zookeeper configuration
•  Collection require (and can share) configurations in instancedir
•  Bigtop-provided tool: solrcrl!
$ solrctl [init|instacedir|collection|…]
Creating a collection
# solrctl init
$ solrctl instancedir --generate /tmp/web_crawl
$ vim /tmp/web_crawl/conf/schema.xml
$ vim /tmp/web_crawl/conf/solrconfig.xml
$ solrctl instancedir --create web_crawl /tmp/web_crawl
$ solrctl collection --create web_crawl -s 1
Hue
•  Apache licensed, but not an ASF project
•  A nice, flexible UI for Hadoop bigdata management platform
•  Follows an extensible app model
Demo time!
Where to go from here
Questions?

More Related Content

What's hot

Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deployment
Karthik .P.R
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
Cyanny LIANG
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
Colin Charles
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
Rahul Jain
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
Ike Ellis
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Lucidworks
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Avinash Ramineni
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
Alexander Solovyev
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
Mydbops
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
Hendrik Muhs
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
 

What's hot (20)

Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deployment
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 

Viewers also liked

Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
TNR Global
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
buildacloud
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
Grant Ingersoll
 
CDAP, la boîte à outil pour concevoir vos applications Big Data
CDAP,  la boîte à outil pour concevoir vos applications Big DataCDAP,  la boîte à outil pour concevoir vos applications Big Data
CDAP, la boîte à outil pour concevoir vos applications Big Data
Synaltic Group
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
Mike Frampton
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
thelabdude
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 

Viewers also liked (7)

Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
CDAP, la boîte à outil pour concevoir vos applications Big Data
CDAP,  la boîte à outil pour concevoir vos applications Big DataCDAP,  la boîte à outil pour concevoir vos applications Big Data
CDAP, la boîte à outil pour concevoir vos applications Big Data
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 

Similar to Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your bigdata

Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudAndrei Savu
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARE
Fernando Lopez Aguilar
 
Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationFIWARE
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
rhatr
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Data Con LA
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
rhatr
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
gagravarr
 
HBase, no trouble
HBase, no troubleHBase, no trouble
HBase, no trouble
LINE Corporation
 
Infrastructure as Code: Introduction to Terraform
Infrastructure as Code: Introduction to TerraformInfrastructure as Code: Introduction to Terraform
Infrastructure as Code: Introduction to Terraform
Alexander Popov
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Big Data Joe™ Rossi
 
6.hive
6.hive6.hive
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
hdhappy001
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
dave_revell
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 

Similar to Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your bigdata (20)

Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARE
 
Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE Implementation
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
מיכאל
מיכאלמיכאל
מיכאל
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
 
HBase, no trouble
HBase, no troubleHBase, no trouble
HBase, no trouble
 
Infrastructure as Code: Introduction to Terraform
Infrastructure as Code: Introduction to TerraformInfrastructure as Code: Introduction to Terraform
Infrastructure as Code: Introduction to Terraform
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
6.hive
6.hive6.hive
6.hive
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 

More from rhatr

Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystem
rhatr
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
rhatr
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
rhatr
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
rhatr
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
rhatr
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
rhatr
 

More from rhatr (7)

Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystem
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 

Recently uploaded

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 

Recently uploaded (20)

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your bigdata