Realtime Analytics with Storm and Hadoop

DataWorks Summit
DataWorks SummitDataWorks Summit
Storm + Hadoop

            @nathanmarz   1
So many Big Data technologies...




                                   2
So many Big Data technologies...




                                   2
So many Big Data technologies...




                                   2
So many Big Data technologies...




                                   2
So many Big Data technologies...




                                   2
So many Big Data technologies...




                                   2
So many Big Data technologies...



 Storm

                                   2
So many Big Data technologies...



 Storm

                                   2
So many Big Data technologies...



 Storm

                                   2
So many Big Data technologies...



 Storm
            Kafka
                                   2
How to make these tools work
together?




                               3
Goals of data system
• Low latency reads
• Low latency writes
• Fault-tolerant
• Scalable




                       4
What is a data system?


    Query = Function(All data)



                                 5
Is there a general purpose way to
compute arbitrary functions in
realtime?

                                    6
(What’s the title of this talk?)


                                   7
Example query


 Total number of pageviews to a
 URL over a range of time

                                  8
Example query




          Implementation   9
Too slow: “all data” is petabyte-scale


                                     10
Precomputation



          All    Query
         data


                         11
Precomputation


     All     Precomputed
                           Query
    data         view




                                   12
Example query
 Pageview

 Pageview

 Pageview                                  2930
                                   Query
 Pageview

 Pageview
  All data
                Precomputed view

                                              13
Precomputation


     All     Precomputed
                           Query
    data         view




                                   14
Precomputation


     All              Precomputed
                                               Query
    data   Function
                          view
                                    Function




                                                       15
Hadoop


 Great at computing arbitrary
 functions


                                16
Expressing those functions


                       Cascalog


                 Scalding
                                  17
Hadoop precomputation
                                    Batch view #1

                       e wo rkflow
              MapR educ
   All data



              MapRed
                    uce work
                            fl ow    Batch view #2


                                                    18
Batch view database

Need a database that...
• Is batch-writable from Hadoop
• Has fast random reads




                                  19
Batch view database


  No random writes required!



                               20
Batch view database

Examples
• ElephantDB
• Voldemort
• Manhattan




                      21
Batch view database

• Extremely simple
• ElephantDB is only a few thousand lines of code




                                                    22
Hadoop precomputation




                        23
So we’re done, right?


                        24
Not quite...
• A batch workflow is too slow
• Views are out of date


             Absorbed into batch views   Not absorbed


                                                   Now

                                Time
                                                         25
Not quite...
                                           Just a few hours
• A batch workflow is too slow              of data!
• Views are out of date


             Absorbed into batch views   Not absorbed


                                                   Now

                                Time
                                                              25
Compensating for last few hours of
data
                           Realtime view #1




New data stream
                           Realtime view #2




                                              26
Compensating for last few hours of
data
                           Realtime view #1




New data stream
                           Realtime view #2




                  Storm                       26
Realtime views
Random read / random write databases
• Cassandra
• HBase
• Riak




                                       27
Application queries

         Batch view


                        Merge
        Realtime view




                                28
Precomputation


     All     Precomputed
                           Query
    data         view




                                   29
Precomputation

               All   Precomputed
                      batch view
              data
                                     Query
                     Precomputed
                     realtime view
 New data stream


                                             30
Precomputation

               All   Hadoop Precomputed
                               batch view
              data
                                              Query
                              Precomputed
                              realtime view
 New data stream


                                                      30
Precomputation

               All   Hadoop Precomputed
                               batch view
              data
                                              Query
                              Precomputed
                              realtime view
 New data stream     Storm


                                                      30
Storm

                          Realtime view #1




New data stream
                          Realtime view #2




                  Storm                      31
Storm
Realtime computation system
• Guarantees data will be processed
• Horizontally scalable
• Fault-tolerant
• Fast




                                      32
Storm

        Source stream




        Source stream
                        Storm


                                33
Storm Cluster




                34
Storm Cluster




       Master node (similar to Hadoop JobTracker)   35
Storm Cluster




          Used for cluster coordination   36
Storm Cluster




           Run worker processes   37
Starting a topology




                      38
Killing a topology




                     39
Storm concepts
• Streams
• Spouts
• Bolts
• Topologies




                 40
Streams


    Tuple   Tuple   Tuple   Tuple   Tuple   Tuple   Tuple




               Unbounded sequence of tuples                 41
Spouts




         Source of streams   42
Spouts
• Read from Kestrel queue
• Read directly from Twitter streaming API




                                             43
Bolts




        44
Bolts
• Functions
• Filters
• Joins
• Aggregations
• Talk to databases




                      45
Topology




           46
Tasks




        47
Stream grouping




     When a tuple is emitted, to which task does it go to?   48
Stream grouping

• Shuffle grouping: pick a random task
• Fields grouping: mod hashing on a subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id

                                                            49
Streaming word count




                       50
Streaming word count




                       51
Streaming word count




                       52
Streaming word count




                       53
Streaming word count




                       54
Streaming word count




                       55
Precomputation


     All            Precomputed
                                  Query
    data   Hadoop
                        views

              +
            Storm


                                          56
Precomputation


     All            Precomputed
                                          Query
    data   Hadoop
                        views
                                  Storm
              +
            Storm


                                                  57
Distributed RPC


 Sometimes there’s very little
 you can precompute


                                 58
Distributed RPC


 And you still require a lot of
 on-the-fly computation


                                  59
Example


 Reach is the number of unique
 people exposed to a URL on
 Twitter
                                 60
Reach
                    Follower
                               Distinct
          Tweeter   Follower   follower

                    Follower
                               Distinct
    URL   Tweeter              follower
                    Follower

                    Follower   Distinct
          Tweeter              follower
                    Follower

                                          61
Reach topology




                 62
Distributed RPC




                  63
Storm + HDFS

                     HDFS




      New data       Storm       Distributed RPC



  Use HBase-like strategy to reliably store state
  within Storm bolts
                                                    64
Storm + HDFS



 https://github.com/nathanmarz/storm-contrib/tree/master/storm-state




                      storm-state library                              65
Missing pieces
• Getting data into Storm
• Getting data into Hadoop




                             66
Getting data into Storm
Queuing system
• Kestrel
• Kafka
• RabbitMQ




                          67
Getting data into Hadoop
• Scribe
• Flume
• Kafka




                           68
Learn more




        http://manning.com/marz   69
Questions?




             70
1 of 83

Recommended

Introduction to Presto at Treasure Data by
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
1.7K views28 slides
How to build a streaming Lakehouse with Flink, Kafka, and Hudi by
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
488 views16 slides
Understanding Query Plans and Spark UIs by
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
4.7K views50 slides
How YugaByte DB Implements Distributed PostgreSQL by
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLYugabyte
1.3K views65 slides
Apache Kudu: Technical Deep Dive

 by
Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
4.5K views48 slides
Native Support of Prometheus Monitoring in Apache Spark 3.0 by
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
2.4K views41 slides

More Related Content

What's hot

Optimizing Delta/Parquet Data Lakes for Apache Spark by
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
1K views35 slides
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J... by
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
98.6K views44 slides
Easy, scalable, fault tolerant stream processing with structured streaming - ... by
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
4.8K views65 slides
Using Queryable State for Fun and Profit by
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
258 views37 slides
Scalability, Availability & Stability Patterns by
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
516K views196 slides
Spark SQL Deep Dive @ Melbourne Spark Meetup by
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
9K views57 slides

What's hot(20)

Optimizing Delta/Parquet Data Lakes for Apache Spark by Databricks
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks1K views
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J... by Databricks
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks98.6K views
Easy, scalable, fault tolerant stream processing with structured streaming - ... by Databricks
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks4.8K views
Using Queryable State for Fun and Profit by Flink Forward
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward258 views
Scalability, Availability & Stability Patterns by Jonas Bonér
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér516K views
Spark SQL Deep Dive @ Melbourne Spark Meetup by Databricks
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks9K views
Making Apache Spark Better with Delta Lake by Databricks
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks5.4K views
Fine Tuning and Enhancing Performance of Apache Spark Jobs by Databricks
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks2.5K views
Apache Spark Core—Deep Dive—Proper Optimization by Databricks
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
Databricks6.1K views
Deep Dive into GPU Support in Apache Spark 3.x by Databricks
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
Databricks2.3K views
Tuning Apache Ambari performance for Big Data at scale with 3000 agents by DataWorks Summit
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit2.2K views
Hadoop Query Performance Smackdown by DataWorks Summit
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
DataWorks Summit2.6K views
Top 5 Mistakes to Avoid When Writing Apache Spark Applications by Cloudera, Inc.
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.127.8K views
Introduction to Apache Spark by Rahul Jain
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain24.8K views
Kafka as your Data Lake - is it Feasible? by Guido Schmutz
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz1.8K views
A Thorough Comparison of Delta Lake, Iceberg and Hudi by Databricks
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks11.1K views
Under the Hood of a Shard-per-Core Database Architecture by ScyllaDB
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB1.3K views
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day by C4Media
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media1.6K views

Viewers also liked

Storm: distributed and fault-tolerant realtime computation by
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
232.6K views75 slides
Scaling Apache Storm - Strata + Hadoop World 2014 by
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
167.6K views80 slides
Resource Aware Scheduling in Apache Storm by
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormDataWorks Summit/Hadoop Summit
93.1K views38 slides
Yahoo compares Storm and Spark by
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and SparkChicago Hadoop Users Group
198.4K views27 slides
Hadoop Summit Europe 2014: Apache Storm Architecture by
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
188K views113 slides
Apache Storm 0.9 basic training - Verisign by
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
233.9K views129 slides

Viewers also liked(10)

Storm: distributed and fault-tolerant realtime computation by nathanmarz
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz232.6K views
Scaling Apache Storm - Strata + Hadoop World 2014 by P. Taylor Goetz
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz167.6K views
Hadoop Summit Europe 2014: Apache Storm Architecture by P. Taylor Goetz
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz188K views
Apache Storm 0.9 basic training - Verisign by Michael Noll
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll233.9K views
Kafka and Storm - event processing in realtime by Guido Schmutz
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz117.1K views
Kafka Tutorial Advanced Kafka Consumers by Jean-Paul Azar
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
Jean-Paul Azar16.5K views
Apache storm vs. Spark Streaming by P. Taylor Goetz
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz210.5K views
Realtime Analytics with Hadoop and HBase by larsgeorge
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
larsgeorge20.3K views

Similar to Realtime Analytics with Storm and Hadoop

A real time architecture using Hadoop and Storm @ FOSDEM 2013 by
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
34K views67 slides
The Secrets of Building Realtime Big Data Systems by
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
59.9K views50 slides
Performance Management in ‘Big Data’ Applications by
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
1.6K views46 slides
Steve Watt Presentation by
Steve Watt PresentationSteve Watt Presentation
Steve Watt PresentationBig Data Houston
515 views43 slides
Lean & agile with MongoDB by
Lean & agile with MongoDBLean & agile with MongoDB
Lean & agile with MongoDBJohannes Brandstetter
1K views92 slides
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La... by
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...jaxLondonConference
1.7K views75 slides

Similar to Realtime Analytics with Storm and Hadoop(20)

A real time architecture using Hadoop and Storm @ FOSDEM 2013 by Nathan Bijnens
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens34K views
The Secrets of Building Realtime Big Data Systems by nathanmarz
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
nathanmarz59.9K views
Performance Management in ‘Big Data’ Applications by Michael Kopp
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp1.6K views
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La... by jaxLondonConference
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
jaxLondonConference1.7K views
Large scale computing with mapreduce by hansen3032
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032580 views
An introduction to apache drill presentation by MapR Technologies
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies2.7K views
Large Scale Data Analysis Tools by boorad
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
boorad2.8K views
Introduction to Spark - Phoenix Meetup 08-19-2014 by cdmaxime
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
cdmaxime1.6K views
Fixing twitter by Roger Xia
Fixing twitterFixing twitter
Fixing twitter
Roger Xia546 views
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ... by xlight
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight444 views
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ... by smallerror
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror1.4K views
A real-time architecture using Hadoop and Storm @ JAX London by Nathan Bijnens
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens11.1K views
Tech4Africa - Opportunities around Big Data by Steve Watt
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
Steve Watt1.2K views
John adams talk cloudy by John Adams
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams3.3K views

More from DataWorks Summit

Data Science Crash Course by
Data Science Crash CourseData Science Crash Course
Data Science Crash CourseDataWorks Summit
19.3K views47 slides
Floating on a RAFT: HBase Durability with Apache Ratis by
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
2.9K views20 slides
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi by
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
2.1K views19 slides
HBase Tales From the Trenches - Short stories about most common HBase operati... by
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
1.8K views18 slides
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac... by
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
1.6K views74 slides
Managing the Dewey Decimal System by
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
1K views8 slides

More from DataWorks Summit(20)

Floating on a RAFT: HBase Durability with Apache Ratis by DataWorks Summit
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit2.9K views
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi by DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit2.1K views
HBase Tales From the Trenches - Short stories about most common HBase operati... by DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit1.8K views
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac... by DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit1.6K views
Practical NoSQL: Accumulo's dirlist Example by DataWorks Summit
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit834 views
HBase Global Indexing to support large-scale data ingestion at Uber by DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit915 views
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix by DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit714 views
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi by DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit1.3K views
Supporting Apache HBase : Troubleshooting and Supportability Improvements by DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit1.8K views
Security Framework for Multitenant Architecture by DataWorks Summit
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit1.1K views
Presto: Optimizing Performance of SQL-on-Anything Engine by DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit1.8K views
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl... by DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit3.2K views
Extending Twitter's Data Platform to Google Cloud by DataWorks Summit
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit1K views
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi by DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit4K views
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger by DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit957 views
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory... by DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit771 views
Computer Vision: Coming to a Store Near You by DataWorks Summit
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit214 views
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark by DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit615 views

Recently uploaded

Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
20 views161 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
24 views29 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
317 views86 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
24 views52 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
48 views69 slides
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
19 views42 slides

Recently uploaded(20)

STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker48 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn26 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi139 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab23 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely29 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec15 views

Realtime Analytics with Storm and Hadoop