SlideShare a Scribd company logo
1 of 36
1
Real-time searching
of big data with
Solr and Hadoop
Rod Cope, CTO
2
• Introduction
• The problem
• The solution
• Top 10 lessons
• Final thoughts
• Q&A
3
Rod Cope, CTO
Rogue Wave Software
4
• “Big data”
– All the world’s open source software
– Metadata, code, indexes
– Individual tables contain many terabytes
– Relational databases aren’t scale-free
• Growing every day
• Need real-time random access to all data
• Long-running and complex analysis jobs
5
• Hadoop, HBase, and Solr
– Hadoop – distributed file system, map/reduce
– HBase – “NoSQL” data store – column-oriented
– Solr – search server based on Lucene
– All are scalable, flexible, fast, well-supported, and used in
production environments
• And a supporting cast of thousands…
– Stargate, MySQL, Rails, Redis, Resque,
– Nginx, Unicorn, HAProxy, Memcached,
– Ruby, JRuby, CentOS, …
6
Internet Application LAN Data LAN *Caching and load balancing not shown
7
• HBase  NoSQL
– Think hash table, not relational database
• How do find my data if primary key won’t cut it?
• Solr to the rescue
– Very fast, highly scalable search server with built-in sharding
and replication – based on Lucene
– Dynamic schema, powerful query language, faceted search,
accessible via simple REST-like web API w/XML, JSON, Ruby,
and other data formats
8
• Sharding
– Query any server – it executes the same query against all other servers in
the group
– Returns aggregated result to original caller
• Async replication (slaves poll their masters)
– Can use repeaters if replicating across data centers
• OpenLogic
– Solr farm, sharded, cross-replicated, fronted with HAProxy
• Load balanced writes across masters, reads across masters and slaves
• Be careful not to over-commit
– Billions of lines of code in HBase, all indexed in Solr for real-time search in
multiple ways
– Over 20 Solr fields indexed per source file
9
10
11
12
13
14
15
• Experiment with different Solr merge factors
– During huge loads, it can help to use a higher factor for load
performance
• Minimize index manipulation gymnastics
• Start with something like 25
– When you’re done with the massive initial load/import, turn it
back down for search performance
• Minimize number of queries
• Start with something like 5
• Note that a small merge factor will hurt indexing performance if you
need to do massive loads on a frequent basis or continuous
indexing
16
• Test your write-focused load balancing
– Look for large skews in Solr index size
– Note: you may have to commit, optimize, write again, and commit
before you can really tell
• Make sure your replication slaves are keeping up
– Using identical hardware helps
– If index directories don’t look the same, something is wrong
17
• Don’t commit to Solr too frequently
– It’s easy to auto-commit or commit after every record
– Doing this 100’s of times per second will take Solr down,
especially if you have serious warm up queries configured
• Avoid putting large values in HBase (> 5MB)
– Works, but may cause instability and/or performance issues
– Rows and columns are cheap, so use more of them instead
18
• Don’t use a single machine to load the cluster
– You might not live long enough to see it finish
• At OpenLogic, we spread raw source data across many machines and
hard drives via NFS
– Be very careful with NFS configuration – can hang machines
• Load data into HBase via Hadoop map/reduce jobs
– Turn off WAL for much better performance
• put.setWriteToWAL(false)
– Index in Solr as you go
• Good way to test your load balancing write schemes and replication set up
– This will find your weak spots!
19
• Writing data loading jobs can be tedious
• Scripting is faster and easier than writing Java
• Great for system administration tasks, testing
• Standard HBase shell is based on JRuby
• Very easy Map/Reduce jobs with J/Ruby and Wukong
• Used heavily at OpenLogic
– Productivity of Ruby
– Power of Java Virtual Machine
– Ruby on Rails, Hadoop integration, GUI clients
20
21
JRuby
list = ["Rod", "Neeta", "Eric", "Missy"]
shorts = list.find_all { |name| name.size <= 4 }
puts shorts.size
shorts.each { |name| puts name }
-> 2
-> Rod
Eric
Groovy
list = ["Rod", "Neeta", "Eric", "Missy"]
shorts = list.findAll { name -> name.size() <= 4 }
println shorts.size
shorts.each { name -> println name }
-> 2
-> Rod
Eric
22
• Hadoop
– SPOF around Namenode, append functionality
• HBase
– Backup, replication, and indexing solutions in flux
• Solr
– Several competing solutions around cloud-like scalability and
fault-tolerance, including ZooKeeper and Hadoop integration
– No clear winner, none quite ready for production
23
• Many moving parts
– It’s easy to let typos slip through
– Consider automated configuration via Chef, Puppet, or similar
• Pay attention to the details
– Operating system – max open files, sockets, and other limits
– Hadoop and HBase configuration
• http://hbase.apache.org/book.html#trouble
– Solr merge factor and norms
• Don’t starve HBase or Solr for memory
– Swapping will cripple your system
24
• “Commodity hardware” != 3 year old desktop
• Dual quad-core, 32GB RAM, 4+ disks
• Don’t bother with RAID on Hadoop data disks
– Be wary of non-enterprise drives
• Expect ugly hardware issues at some point
25
• Environment
– 100+ CPU cores
– 100+ Terabytes of disk
– Machines don’t have identity
– Add capacity by plugging in new machines
• Why not Amazon EC2?
– Great for computational bursts
– Expensive for long-term storage of big data
– Not yet consistent enough for mission-critical usage of HBase
26
• Dual quad-core and dual hex-core
• Dell boxes
• 32-64GB RAM
– ECC (highly recommended by Google)
• 6 x 2TB enterprise hard drives
• RAID 1 on two of the drives
– OS, Hadoop, HBase, Solr, NFS mounts (be careful!), job code, etc.
– Key “source” data backups
• Hadoop datanode gets remaining drives
• Redundant enterprise switches
• Dual- and quad-gigabit NIC’s
27
• Amazon EC2
– EBS Storage
• 100TB * $0.10/GB/month = $120k/year
– Double Extra Large instances
• 13 EC2 compute units, 34.2GB RAM
• 20 instances * $1.00/hr * 8,760 hrs/yr = $175k/year
• 3 year reserved instances
– 20 * 4k = $80k up front to reserve
– (20 * $0.34/hr * 8,760 hrs/yr * 3 yrs) / 3 = $86k/year to operate
– Totals for 20 virtual machines
• 1st year cost: $120k + $80k + $86k = $286k
• 2nd & 3rd year costs: $120k + $86k = $206k
• Average: ($286k + $206k + $206k) / 3 = $232k/year
28
• Buy your own
– 20 * Dell servers w/12 CPU cores, 32GB RAM, 5 TB disk = $160k
• Over 33 EC2 compute units each
– Total: $53k/year (amortized over 3 years)
29
• Amazon EC2
– 20 instances * 13 EC2 compute units = 260 EC2 compute units
– Cost: $232k/year
• Buy your own
– 20 machines * 33 EC2 compute units = 660 EC2 compute units
– Cost: $53k/year
– Does not include hosting and maintenance costs
• Don’t think system administration goes away
– You still “own” all the instances – monitoring, debugging, support
30
• Hardware
– Power supplies, hard drives
• Operating System
– Kernel panics, zombie processes, dropped packets
• Software Servers
– Hadoop datanodes, HBase regionservers, Stargate servers, Solr
servers
• Your Code and Data
– Stray map/reduce jobs, strange corner cases in your data leading
to program failures
31
32
• Hadoop, HBase, Solr
• Apache, Tomcat, ZooKeeper,
HAProxy
• Stargate, JRuby, Lucene, Jetty,
HSQLDB, Geronimo
• Apache Commons, JUnit
• CentOS
• Dozens more
• Too expensive to build or buy
everything
33
• You can host big data in your own private cloud
– Tools are available today that didn’t exist a few years ago
– Fast to prototype – production readiness takes time
– Expect to invest in training and support
• Hbase and Solr are fast
– 100+ random queries/sec per instance
– Give them memory and stand back
• Hbase Scales, Solr scales (to a point)
– Don’t worry about out-growing a few machines
– Do worry about out-growing a rack of Solr instances
• Look for ways to partition your data other than “automatic” sharding
34
Q&A
35
Rod Cope
rod.cope@roguewave.com
See us in action:
www.roguewave.com
36

More Related Content

What's hot

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackJustin Swanhart
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014Hassan Islamov
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Allen Wittenauer
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupCloudera, Inc.
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloudJustin Swanhart
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryJustin Swanhart
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works BestEDB
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 

What's hot (20)

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stack
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard query
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works Best
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 

Viewers also liked

Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...slashn
 
Five ways to protect your software supply chain from hacks, quacks, and wrecks
Five ways to protect your software supply chain from hacks, quacks, and wrecksFive ways to protect your software supply chain from hacks, quacks, and wrecks
Five ways to protect your software supply chain from hacks, quacks, and wrecksRogue Wave Software
 
Shifting the conversation from active interception to proactive neutralization
Shifting the conversation from active interception to proactive neutralization Shifting the conversation from active interception to proactive neutralization
Shifting the conversation from active interception to proactive neutralization Rogue Wave Software
 
Top 5 best practice for delivering secure in-vehicle software
Top 5 best practice for delivering secure in-vehicle softwareTop 5 best practice for delivering secure in-vehicle software
Top 5 best practice for delivering secure in-vehicle softwareRogue Wave Software
 
Enterprise Search: An Information Architect's Perspective
Enterprise Search: An Information Architect's PerspectiveEnterprise Search: An Information Architect's Perspective
Enterprise Search: An Information Architect's PerspectivePeter Morville
 
Legal and Practical Concerns with Software Development
Legal and Practical Concerns with Software DevelopmentLegal and Practical Concerns with Software Development
Legal and Practical Concerns with Software DevelopmentRogue Wave Software
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & SolrLucidworks
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
Dlvr.it 使用說明
Dlvr.it 使用說明Dlvr.it 使用說明
Dlvr.it 使用說明waytorich
 
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
Faster, Cheaper, Better - Replacing Oracle with Hadoop & SolrFaster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
Faster, Cheaper, Better - Replacing Oracle with Hadoop & SolrKen Krugler
 

Viewers also liked (13)

Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
 
Five ways to protect your software supply chain from hacks, quacks, and wrecks
Five ways to protect your software supply chain from hacks, quacks, and wrecksFive ways to protect your software supply chain from hacks, quacks, and wrecks
Five ways to protect your software supply chain from hacks, quacks, and wrecks
 
Shifting the conversation from active interception to proactive neutralization
Shifting the conversation from active interception to proactive neutralization Shifting the conversation from active interception to proactive neutralization
Shifting the conversation from active interception to proactive neutralization
 
Top 5 best practice for delivering secure in-vehicle software
Top 5 best practice for delivering secure in-vehicle softwareTop 5 best practice for delivering secure in-vehicle software
Top 5 best practice for delivering secure in-vehicle software
 
Enterprise Search: An Information Architect's Perspective
Enterprise Search: An Information Architect's PerspectiveEnterprise Search: An Information Architect's Perspective
Enterprise Search: An Information Architect's Perspective
 
Legal and Practical Concerns with Software Development
Legal and Practical Concerns with Software DevelopmentLegal and Practical Concerns with Software Development
Legal and Practical Concerns with Software Development
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
Dlvr.it 使用說明
Dlvr.it 使用說明Dlvr.it 使用說明
Dlvr.it 使用說明
 
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
Faster, Cheaper, Better - Replacing Oracle with Hadoop & SolrFaster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
 

Similar to Real-time searching of big data with Solr and Hadoop

The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...Lucas Jellema
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)guest0f8e278
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolEberhard Wolff
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 

Similar to Real-time searching of big data with Solr and Hadoop (20)

The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL Tool
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 

More from Rogue Wave Software

The Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveThe Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveRogue Wave Software
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureRogue Wave Software
 
Disrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationDisrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationRogue Wave Software
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...Rogue Wave Software
 
Adding layers of security to an API in real-time
Adding layers of security to an API in real-timeAdding layers of security to an API in real-time
Adding layers of security to an API in real-timeRogue Wave Software
 
Getting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyGetting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyRogue Wave Software
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsRogue Wave Software
 
The forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youThe forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youRogue Wave Software
 
Are open source and embedded software development on a collision course?
Are open source and embedded software development on a  collision course?Are open source and embedded software development on a  collision course?
Are open source and embedded software development on a collision course?Rogue Wave Software
 
Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Rogue Wave Software
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure successRogue Wave Software
 
PSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliancePSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and complianceRogue Wave Software
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureRogue Wave Software
 
How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)Rogue Wave Software
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Rogue Wave Software
 
How to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxHow to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxRogue Wave Software
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsRogue Wave Software
 
Enterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSEnterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSRogue Wave Software
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migrationRogue Wave Software
 
How to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmHow to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmRogue Wave Software
 

More from Rogue Wave Software (20)

The Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveThe Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data Perspective
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failure
 
Disrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationDisrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformation
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
 
Adding layers of security to an API in real-time
Adding layers of security to an API in real-timeAdding layers of security to an API in real-time
Adding layers of security to an API in real-time
 
Getting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyGetting the most from your API management platform: A case study
Getting the most from your API management platform: A case study
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
 
The forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youThe forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for you
 
Are open source and embedded software development on a collision course?
Are open source and embedded software development on a  collision course?Are open source and embedded software development on a  collision course?
Are open source and embedded software development on a collision course?
 
Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success
 
PSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliancePSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliance
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the future
 
How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)
 
How to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxHow to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to Linux
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC apps
 
Enterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSEnterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOS
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration
 
How to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmHow to keep developers happy and lawyers calm
How to keep developers happy and lawyers calm
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

Real-time searching of big data with Solr and Hadoop

  • 1. 1 Real-time searching of big data with Solr and Hadoop Rod Cope, CTO
  • 2. 2 • Introduction • The problem • The solution • Top 10 lessons • Final thoughts • Q&A
  • 3. 3 Rod Cope, CTO Rogue Wave Software
  • 4. 4 • “Big data” – All the world’s open source software – Metadata, code, indexes – Individual tables contain many terabytes – Relational databases aren’t scale-free • Growing every day • Need real-time random access to all data • Long-running and complex analysis jobs
  • 5. 5 • Hadoop, HBase, and Solr – Hadoop – distributed file system, map/reduce – HBase – “NoSQL” data store – column-oriented – Solr – search server based on Lucene – All are scalable, flexible, fast, well-supported, and used in production environments • And a supporting cast of thousands… – Stargate, MySQL, Rails, Redis, Resque, – Nginx, Unicorn, HAProxy, Memcached, – Ruby, JRuby, CentOS, …
  • 6. 6 Internet Application LAN Data LAN *Caching and load balancing not shown
  • 7. 7 • HBase  NoSQL – Think hash table, not relational database • How do find my data if primary key won’t cut it? • Solr to the rescue – Very fast, highly scalable search server with built-in sharding and replication – based on Lucene – Dynamic schema, powerful query language, faceted search, accessible via simple REST-like web API w/XML, JSON, Ruby, and other data formats
  • 8. 8 • Sharding – Query any server – it executes the same query against all other servers in the group – Returns aggregated result to original caller • Async replication (slaves poll their masters) – Can use repeaters if replicating across data centers • OpenLogic – Solr farm, sharded, cross-replicated, fronted with HAProxy • Load balanced writes across masters, reads across masters and slaves • Be careful not to over-commit – Billions of lines of code in HBase, all indexed in Solr for real-time search in multiple ways – Over 20 Solr fields indexed per source file
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15 • Experiment with different Solr merge factors – During huge loads, it can help to use a higher factor for load performance • Minimize index manipulation gymnastics • Start with something like 25 – When you’re done with the massive initial load/import, turn it back down for search performance • Minimize number of queries • Start with something like 5 • Note that a small merge factor will hurt indexing performance if you need to do massive loads on a frequent basis or continuous indexing
  • 16. 16 • Test your write-focused load balancing – Look for large skews in Solr index size – Note: you may have to commit, optimize, write again, and commit before you can really tell • Make sure your replication slaves are keeping up – Using identical hardware helps – If index directories don’t look the same, something is wrong
  • 17. 17 • Don’t commit to Solr too frequently – It’s easy to auto-commit or commit after every record – Doing this 100’s of times per second will take Solr down, especially if you have serious warm up queries configured • Avoid putting large values in HBase (> 5MB) – Works, but may cause instability and/or performance issues – Rows and columns are cheap, so use more of them instead
  • 18. 18 • Don’t use a single machine to load the cluster – You might not live long enough to see it finish • At OpenLogic, we spread raw source data across many machines and hard drives via NFS – Be very careful with NFS configuration – can hang machines • Load data into HBase via Hadoop map/reduce jobs – Turn off WAL for much better performance • put.setWriteToWAL(false) – Index in Solr as you go • Good way to test your load balancing write schemes and replication set up – This will find your weak spots!
  • 19. 19 • Writing data loading jobs can be tedious • Scripting is faster and easier than writing Java • Great for system administration tasks, testing • Standard HBase shell is based on JRuby • Very easy Map/Reduce jobs with J/Ruby and Wukong • Used heavily at OpenLogic – Productivity of Ruby – Power of Java Virtual Machine – Ruby on Rails, Hadoop integration, GUI clients
  • 20. 20
  • 21. 21 JRuby list = ["Rod", "Neeta", "Eric", "Missy"] shorts = list.find_all { |name| name.size <= 4 } puts shorts.size shorts.each { |name| puts name } -> 2 -> Rod Eric Groovy list = ["Rod", "Neeta", "Eric", "Missy"] shorts = list.findAll { name -> name.size() <= 4 } println shorts.size shorts.each { name -> println name } -> 2 -> Rod Eric
  • 22. 22 • Hadoop – SPOF around Namenode, append functionality • HBase – Backup, replication, and indexing solutions in flux • Solr – Several competing solutions around cloud-like scalability and fault-tolerance, including ZooKeeper and Hadoop integration – No clear winner, none quite ready for production
  • 23. 23 • Many moving parts – It’s easy to let typos slip through – Consider automated configuration via Chef, Puppet, or similar • Pay attention to the details – Operating system – max open files, sockets, and other limits – Hadoop and HBase configuration • http://hbase.apache.org/book.html#trouble – Solr merge factor and norms • Don’t starve HBase or Solr for memory – Swapping will cripple your system
  • 24. 24 • “Commodity hardware” != 3 year old desktop • Dual quad-core, 32GB RAM, 4+ disks • Don’t bother with RAID on Hadoop data disks – Be wary of non-enterprise drives • Expect ugly hardware issues at some point
  • 25. 25 • Environment – 100+ CPU cores – 100+ Terabytes of disk – Machines don’t have identity – Add capacity by plugging in new machines • Why not Amazon EC2? – Great for computational bursts – Expensive for long-term storage of big data – Not yet consistent enough for mission-critical usage of HBase
  • 26. 26 • Dual quad-core and dual hex-core • Dell boxes • 32-64GB RAM – ECC (highly recommended by Google) • 6 x 2TB enterprise hard drives • RAID 1 on two of the drives – OS, Hadoop, HBase, Solr, NFS mounts (be careful!), job code, etc. – Key “source” data backups • Hadoop datanode gets remaining drives • Redundant enterprise switches • Dual- and quad-gigabit NIC’s
  • 27. 27 • Amazon EC2 – EBS Storage • 100TB * $0.10/GB/month = $120k/year – Double Extra Large instances • 13 EC2 compute units, 34.2GB RAM • 20 instances * $1.00/hr * 8,760 hrs/yr = $175k/year • 3 year reserved instances – 20 * 4k = $80k up front to reserve – (20 * $0.34/hr * 8,760 hrs/yr * 3 yrs) / 3 = $86k/year to operate – Totals for 20 virtual machines • 1st year cost: $120k + $80k + $86k = $286k • 2nd & 3rd year costs: $120k + $86k = $206k • Average: ($286k + $206k + $206k) / 3 = $232k/year
  • 28. 28 • Buy your own – 20 * Dell servers w/12 CPU cores, 32GB RAM, 5 TB disk = $160k • Over 33 EC2 compute units each – Total: $53k/year (amortized over 3 years)
  • 29. 29 • Amazon EC2 – 20 instances * 13 EC2 compute units = 260 EC2 compute units – Cost: $232k/year • Buy your own – 20 machines * 33 EC2 compute units = 660 EC2 compute units – Cost: $53k/year – Does not include hosting and maintenance costs • Don’t think system administration goes away – You still “own” all the instances – monitoring, debugging, support
  • 30. 30 • Hardware – Power supplies, hard drives • Operating System – Kernel panics, zombie processes, dropped packets • Software Servers – Hadoop datanodes, HBase regionservers, Stargate servers, Solr servers • Your Code and Data – Stray map/reduce jobs, strange corner cases in your data leading to program failures
  • 31. 31
  • 32. 32 • Hadoop, HBase, Solr • Apache, Tomcat, ZooKeeper, HAProxy • Stargate, JRuby, Lucene, Jetty, HSQLDB, Geronimo • Apache Commons, JUnit • CentOS • Dozens more • Too expensive to build or buy everything
  • 33. 33 • You can host big data in your own private cloud – Tools are available today that didn’t exist a few years ago – Fast to prototype – production readiness takes time – Expect to invest in training and support • Hbase and Solr are fast – 100+ random queries/sec per instance – Give them memory and stand back • Hbase Scales, Solr scales (to a point) – Don’t worry about out-growing a few machines – Do worry about out-growing a rack of Solr instances • Look for ways to partition your data other than “automatic” sharding
  • 35. 35 Rod Cope rod.cope@roguewave.com See us in action: www.roguewave.com
  • 36. 36