SlideShare a Scribd company logo
1




              Faster, cheaper, better
                  Replacing Oracle with
                  Hadoop and Solr

                  Ken Krugler
                  Scale Unlimited


         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
2




      Obligatory Background


             Ken Krugler - direct from Nevada City, California
             Krugle startup (2005-2008) used Nutch, Hadoop, Solr
             Now running Scale Unlimited
                     big data + search
                     consulting + training




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
3




      The 50,000ft View

             We helped our client kick the RDBMS habit
                     It’s an analytics web site for display advertising
                     Got rid of DBs handling queries for their web site
                     Now uses Hadoop + Solr to...
                             cut costs
                             add features
                             improve performance
                             increase scalability


         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
4




      What’s an Analytics Web Site?

               Let the user ask questions about data




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
5




      Including Sexy Dashboards

               All driven by slices of the data




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
6




      Behind the web site curtain

             Each view or constraint change triggers queries
                     “sum ad impact for all advertisers on all networks, sort by sum, limit 10”
                     “sum ad impact by ad type for advertiser ‘oracle.com’”


             For millions of records, you have to chose...
                     Fast, accurate, inexpensive - pick any two




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
7




      Combinatorial Explosion

             Too many possibilities to pre-calculate everything
                     more than 10^5 publishers
                     more than 10^6 advertisers
                     30 ad networks, 3 day ranges, etc


             So many trillions of possible combinations
                     Caching of DB query results isn’t very useful



         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
8




      Trouble in UI Land


             UI refresh took 10-30 seconds
             Well outside of target range of “about a second or so”




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
8




      Trouble in UI Land


             UI refresh took 10-30 seconds
             Well outside of target range of “about a second or so”

                     0.1 second: instantaneous
                     1.0 second: I’m still in the flow
                     10 seconds: I’m bored




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
9




      Trouble in the back office

             Beefy hardware for multiple DBs was expensive
                     AWS monthly cost approaching 5 figures
                     And the data sets needed to grow significantly


             Constant schema changes meant painful data reloading
                     Extract, load, transform (inside of DB)
                     Re-indexing of DB fields



         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
10




      A New Approach

             Do analytics off-line using Hadoop
                     Pre-generate as much as possible
             Use Solr as a NoSQL database
                     And leverage search, faceting




                                                                    +   =
         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
11




      Obligatory Architectural Slide

             Two search servers
             8 shards per index
                     Optimize response time
             Additional indexes
                     autocompletion, etc.
             200M total documents




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
12




      What Solr Gives Us

             Fast, memory-efficient queries
                     Count the number of documents that match a query
                     Sort results by fields
                     And search - “Find all Flash ads with the word ‘diet’”


             Fast faceting
                     Count # of results from query that have different values for a field
                     “How many different image ad sizes (w/counts) are used by google?”


         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
13




      How to Connect the Dots
             We have web crawl data - ads, advertisers, publishers, networks
                     http://www.michiguide.com/some-page.html text google
                     DIRECTV® For Businesses Save $13/mo ww.directv.com/business

             We have target Solr schemas with the fields defined

            <field name="network" type="string" indexed="true" stored="false" required="true" />
            <field name="publisher" type="string" indexed="true" stored="false" required="true" />


             How do we get from A to B?


                                       Data
                                                                    f(data)???   Index
                                      Sources

         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
14




      Hadoop ETL


             Implement appropriate Extract, Transform, Load
                     Extract is just parsing text files that are stored in Amazon’s S3
                     Load is building the Solr index and deploying it to the search servers
                     What about that pesky “Transform” part?




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
15




      Simplicity Itself

           25 Hadoop Jobs
           Developed with Cascading
           Daily run is $25




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
16




      Workflow Essentials

             “Do analytics offline” means anything that involves aggregation
             Solr is fine for first/last/count
             Pre-calculate anything that does math on each record
             Essentially index is pre-calculated answers to 200M questions
                     “what is trendline for ad impact of this advertiser on that publisher?”
                     “which ads use 300x250 images?”




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
17




      Combinatorial Explosion

             Limit questions that can be asked
                     E.g. no arbitrary date ranges
                     Requires tricky “biggest bang for buck” decisions


             Collapse entries that are “all” and only one other
                     Leverage Solr multi-value field support
                     network:all and network:doubleclick are one entry



         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
18




      Reduce Duplicated Data

             De-normalized schema means multiple records with similar data
                     “ad X on network Y”, “ad X on network Z”
                     We couldn’t use Solr’s “join” support (not in 3.6, issues with shards)


             Non-indexed duplicated data goes into “special” records
                     e.g. the records that have “all” for a field value




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
19




      Defer Workflow Optimizations


             Frequently tempted to get tricky
                     But helicopter stunts lead to pain and suffering


             Often complex ETL means running multiple jobs in parallel
                     So job timing/prioritization is more important




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
20




      Analyzing Workflows

             Sadly, hand analysis is
             currently required

             Key is no dead time
                     map/reduce slots


             New solutions
                     Ambrose
                     Driven



         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
21




      Useful Optimizations

             “Cache” results - HDFS storage is cheap
                     Daily processing
                     Daily state + delta from today


             Throw away data ASAP - avoid data baggage
                     Analytics data sets often have many, many fields




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
22




      Map-side Reduction
             Reduce the amount of data being sent from map to reduce
                     Often is bottleneck for jobs, due to network overhead
                     Examples include aggregation, group-level filtering


             Hadoop has “combiners”, which are post-map reducers
                     Do incremental reduce on map side before sending to reducers


             Cascading has “AggregateBy”, which are in-map reducers
                     Keeps some number of results in memory using LRU queue

         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
23




      Avoid Heuristics in Hadoop


             What’s easy to describe (and implement) in a function...
                     is often painful and slow in map-reduce


             Conditional/branching logic is common example
                     If this join result matches X, use it; otherwise join with Y and do Z




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
24




      The Net-Net




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
24




      The Net-Net


             If you have a web site that provides analytics




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
24




      The Net-Net


             If you have a web site that provides analytics
             And it’s currently using a RDBMS like Oracle




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
24




      The Net-Net


             If you have a web site that provides analytics
             And it’s currently using a RDBMS like Oracle
             You should be able to make it faster, cheaper, better (and scalable)




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
24




      The Net-Net


             If you have a web site that provides analytics
             And it’s currently using a RDBMS like Oracle
             You should be able to make it faster, cheaper, better (and scalable)
             Using Hadoop & Solr




         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12
25




      Questions?

             Feel free to contact me
                     http://www.scaleunlimited.com/contact/


             Check out Lucid’s “Big Data & Solr” class
                     http://www.lucidimagination.com/services/training/


             Check out Cascading
                     http://www.cascading.org/


         Copyright (c) 2012 Scale Unlimited. All Rights Reserved.

Monday, June 11, 12

More Related Content

Similar to Faster Cheaper Better-Replacing Oracle with Hadoop & Solr

Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
New Features of OBIEE 11.1.1.6.x
New Features of OBIEE 11.1.1.6.x New Features of OBIEE 11.1.1.6.x
New Features of OBIEE 11.1.1.6.x
Capgemini
 
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-finalDDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-finalIntelHealthcare
 
CA_Plex_SupportForModernizingIBM_DB2_for_i
CA_Plex_SupportForModernizingIBM_DB2_for_iCA_Plex_SupportForModernizingIBM_DB2_for_i
CA_Plex_SupportForModernizingIBM_DB2_for_iGeorge Jeffcock
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Cloudera, Inc.
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
EMC
 
Database Development: The Object-oriented and Test-driven Way
Database Development: The Object-oriented and Test-driven WayDatabase Development: The Object-oriented and Test-driven Way
Database Development: The Object-oriented and Test-driven Way
TechWell
 
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing ItYou Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
Aleksandr Yampolskiy
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph database
Amirhossein Saberi
 
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLAccelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Sumeet Bansal
 
Sharepoint and SQL Server 2012
Sharepoint and SQL Server 2012Sharepoint and SQL Server 2012
Sharepoint and SQL Server 2012
James Tramel
 
GoldenGate Case Study - Enterprise IT
GoldenGate Case Study - Enterprise ITGoldenGate Case Study - Enterprise IT
GoldenGate Case Study - Enterprise ITPaul Steffensen
 
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBCUsing SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Kingsley Uyi Idehen
 
Hadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionHadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or Question
Tony Baer
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionDataWorks Summit
 

Similar to Faster Cheaper Better-Replacing Oracle with Hadoop & Solr (20)

Running a Lean Startup with AWS
Running a Lean Startup with AWSRunning a Lean Startup with AWS
Running a Lean Startup with AWS
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
New Features of OBIEE 11.1.1.6.x
New Features of OBIEE 11.1.1.6.x New Features of OBIEE 11.1.1.6.x
New Features of OBIEE 11.1.1.6.x
 
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-finalDDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
 
CA_Plex_SupportForModernizingIBM_DB2_for_i
CA_Plex_SupportForModernizingIBM_DB2_for_iCA_Plex_SupportForModernizingIBM_DB2_for_i
CA_Plex_SupportForModernizingIBM_DB2_for_i
 
Antonio piraino v1
Antonio piraino v1Antonio piraino v1
Antonio piraino v1
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Database Development: The Object-oriented and Test-driven Way
Database Development: The Object-oriented and Test-driven WayDatabase Development: The Object-oriented and Test-driven Way
Database Development: The Object-oriented and Test-driven Way
 
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing ItYou Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
You Too Can Be a Radio Host Or How We Scaled a .NET Startup And Had Fun Doing It
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph database
 
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLAccelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQL
 
Sharepoint and SQL Server 2012
Sharepoint and SQL Server 2012Sharepoint and SQL Server 2012
Sharepoint and SQL Server 2012
 
GoldenGate Case Study - Enterprise IT
GoldenGate Case Study - Enterprise ITGoldenGate Case Study - Enterprise IT
GoldenGate Case Study - Enterprise IT
 
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBCUsing SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
 
Hadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionHadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or Question
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Faster Cheaper Better-Replacing Oracle with Hadoop & Solr

  • 1. 1 Faster, cheaper, better Replacing Oracle with Hadoop and Solr Ken Krugler Scale Unlimited Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 2. 2 Obligatory Background Ken Krugler - direct from Nevada City, California Krugle startup (2005-2008) used Nutch, Hadoop, Solr Now running Scale Unlimited big data + search consulting + training Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 3. 3 The 50,000ft View We helped our client kick the RDBMS habit It’s an analytics web site for display advertising Got rid of DBs handling queries for their web site Now uses Hadoop + Solr to... cut costs add features improve performance increase scalability Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 4. 4 What’s an Analytics Web Site? Let the user ask questions about data Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 5. 5 Including Sexy Dashboards All driven by slices of the data Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 6. 6 Behind the web site curtain Each view or constraint change triggers queries “sum ad impact for all advertisers on all networks, sort by sum, limit 10” “sum ad impact by ad type for advertiser ‘oracle.com’” For millions of records, you have to chose... Fast, accurate, inexpensive - pick any two Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 7. 7 Combinatorial Explosion Too many possibilities to pre-calculate everything more than 10^5 publishers more than 10^6 advertisers 30 ad networks, 3 day ranges, etc So many trillions of possible combinations Caching of DB query results isn’t very useful Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 8. 8 Trouble in UI Land UI refresh took 10-30 seconds Well outside of target range of “about a second or so” Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 9. 8 Trouble in UI Land UI refresh took 10-30 seconds Well outside of target range of “about a second or so” 0.1 second: instantaneous 1.0 second: I’m still in the flow 10 seconds: I’m bored Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 10. 9 Trouble in the back office Beefy hardware for multiple DBs was expensive AWS monthly cost approaching 5 figures And the data sets needed to grow significantly Constant schema changes meant painful data reloading Extract, load, transform (inside of DB) Re-indexing of DB fields Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 11. 10 A New Approach Do analytics off-line using Hadoop Pre-generate as much as possible Use Solr as a NoSQL database And leverage search, faceting + = Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 12. 11 Obligatory Architectural Slide Two search servers 8 shards per index Optimize response time Additional indexes autocompletion, etc. 200M total documents Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 13. 12 What Solr Gives Us Fast, memory-efficient queries Count the number of documents that match a query Sort results by fields And search - “Find all Flash ads with the word ‘diet’” Fast faceting Count # of results from query that have different values for a field “How many different image ad sizes (w/counts) are used by google?” Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 14. 13 How to Connect the Dots We have web crawl data - ads, advertisers, publishers, networks http://www.michiguide.com/some-page.html text google DIRECTV® For Businesses Save $13/mo ww.directv.com/business We have target Solr schemas with the fields defined <field name="network" type="string" indexed="true" stored="false" required="true" /> <field name="publisher" type="string" indexed="true" stored="false" required="true" /> How do we get from A to B? Data f(data)??? Index Sources Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 15. 14 Hadoop ETL Implement appropriate Extract, Transform, Load Extract is just parsing text files that are stored in Amazon’s S3 Load is building the Solr index and deploying it to the search servers What about that pesky “Transform” part? Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 16. 15 Simplicity Itself 25 Hadoop Jobs Developed with Cascading Daily run is $25 Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 17. 16 Workflow Essentials “Do analytics offline” means anything that involves aggregation Solr is fine for first/last/count Pre-calculate anything that does math on each record Essentially index is pre-calculated answers to 200M questions “what is trendline for ad impact of this advertiser on that publisher?” “which ads use 300x250 images?” Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 18. 17 Combinatorial Explosion Limit questions that can be asked E.g. no arbitrary date ranges Requires tricky “biggest bang for buck” decisions Collapse entries that are “all” and only one other Leverage Solr multi-value field support network:all and network:doubleclick are one entry Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 19. 18 Reduce Duplicated Data De-normalized schema means multiple records with similar data “ad X on network Y”, “ad X on network Z” We couldn’t use Solr’s “join” support (not in 3.6, issues with shards) Non-indexed duplicated data goes into “special” records e.g. the records that have “all” for a field value Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 20. 19 Defer Workflow Optimizations Frequently tempted to get tricky But helicopter stunts lead to pain and suffering Often complex ETL means running multiple jobs in parallel So job timing/prioritization is more important Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 21. 20 Analyzing Workflows Sadly, hand analysis is currently required Key is no dead time map/reduce slots New solutions Ambrose Driven Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 22. 21 Useful Optimizations “Cache” results - HDFS storage is cheap Daily processing Daily state + delta from today Throw away data ASAP - avoid data baggage Analytics data sets often have many, many fields Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 23. 22 Map-side Reduction Reduce the amount of data being sent from map to reduce Often is bottleneck for jobs, due to network overhead Examples include aggregation, group-level filtering Hadoop has “combiners”, which are post-map reducers Do incremental reduce on map side before sending to reducers Cascading has “AggregateBy”, which are in-map reducers Keeps some number of results in memory using LRU queue Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 24. 23 Avoid Heuristics in Hadoop What’s easy to describe (and implement) in a function... is often painful and slow in map-reduce Conditional/branching logic is common example If this join result matches X, use it; otherwise join with Y and do Z Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 25. 24 The Net-Net Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 26. 24 The Net-Net If you have a web site that provides analytics Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 27. 24 The Net-Net If you have a web site that provides analytics And it’s currently using a RDBMS like Oracle Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 28. 24 The Net-Net If you have a web site that provides analytics And it’s currently using a RDBMS like Oracle You should be able to make it faster, cheaper, better (and scalable) Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 29. 24 The Net-Net If you have a web site that provides analytics And it’s currently using a RDBMS like Oracle You should be able to make it faster, cheaper, better (and scalable) Using Hadoop & Solr Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12
  • 30. 25 Questions? Feel free to contact me http://www.scaleunlimited.com/contact/ Check out Lucid’s “Big Data & Solr” class http://www.lucidimagination.com/services/training/ Check out Cascading http://www.cascading.org/ Copyright (c) 2012 Scale Unlimited. All Rights Reserved. Monday, June 11, 12