SlideShare a Scribd company logo
Hadoop
ImageProcessing
Pipeline(HIP)
June 10, 2015
Russell Foltz-Smith
Anil Gupta
2
Image Processing Pipeline
● Acquire Images of Vehicle
● Identify updates/deletes to Images
● Generate unique URL for Images
● Crop and Resize Images
● Copy images to Asset Servers
● Dedupe Images
3
Image Processing Pipeline Example
HIP
4
Why Hadoop?
● High Scalability
● Store historical data of Images
● Fault tolerance
● Identify updates to images on basis of content of
URL
5
HIP Components
1. HBase: Datastore for Images and archiving Images
2. MapReduce: Computation engine for Image
Processor
3. Kafka: Publisher/Subscriber for pushing images to
Asset Servers
4. OpenCV Java: Image Processing library
5. Avro: Serialization library for storing data on HDFS
6
HBase Data Model
Tables:
1. IMAGE: Store current set of Images with some metadata
2. IMAGE_ARCHIVE: Stores historical data of Vehicles and
Original Images
7
Column Family Description Versions
I • Store all images of vehicle.
• Stores an Image in each Column
1
H • Stores metadata of all Images 1
Table: IMAGE
RowKey: <Vin_Number>
HBase Data Model
Read patterns for “I” and “H” are mutually exclusive
8
Column Family Description Versions
I Store original images of vehicle.
Only 1 column is stored.
10
A Stores fields of Avro Object of Vehicle
and Image for analytics
10
Table: IMAGE_ARCHIVE
RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index>
HBase Data Model
9
HBase Tuning
● Pre-split tables
● Keep Column names short(2-8 letters)
● Region size 8-10 GB
● Asynchronous clients should buffer(autoFlush=false) Put
operations
● Disable periodic Major Compaction
Pipeline Dataflow Overview
10
InventoryProcessor
Output
[Mapper] Parse &
Validate Records
[Reducer] Identify
CRUD Operation
Kafka
HBase
Asset Servers
CRUD in Reducer
11
Start
Is Deleted?
Yes
Delete Row
in HBase
No
Is Insert?
Yes
Download Images
Generate 6 Sizes
of Image
No Get HTTP Headers of
ImageURL and
Compare with Existing
NoHeader
Mismatch?
Do
Nothing
Yes
1. Write to HBase
2. Write to Kafka
Cascading Downloads
12
One JVM
Process
Yes
[ChainReducer]
ImageProcessorReducer
NoSocket timeout in
500 milliseconds?
No
1. Write to HBase
2. Write to Kafka
ImageProcessorMapper
ImageProcessorRetryMapper
Socket timeout
in 5 seconds?
Mark URL as
“Cannot
Process”
1313
Kafka Producer
● One message per Image file
● Producer Message Format:
● Key: ImageFileName (kafka.serializer.StringEncoder)
● Value: Image (kafka.serializer.DefaultEncoder)
Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg
Value:
14
Kafka Producer Tuning
Property Value Default Value
request.required.acks 1 0
message.send.max.retries 30 3
retry.backoff.ms 5000 100
client.id HIP “”
For Producer, to sustain NODE failure:
retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000)
Failure recovery in
300ms. Really?
Kafka Brokers Tuning
Property Value Default Value
log.retention.bytes 24 GB -1(unlimited)
socket.send.buffer.bytes 10485760 1048576
socket.receive.buffer.bytes 10485760 1048576
1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds.
2. log.retention.bytes = diskspace/number_of_partitions on each node
161616
OpenCV
● Used Java bindings of OpenCV to avoid using Hadoop
Streaming
● Java api is quite straight forward to encode, decode, crop
and resize.
Memory Leak:
Mat.release() has to be used to free up memory used by Mat.
17
Performance
0
50
100
150
200
250
300
350
400
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP
ImageProcessor1.0
HIP scales
Linearly and
at least 10x
faster
18
Cascading Downloads
0
2
4
6
8
10
12
14
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP with Cascading
HIP without Cascading
20%
performance
gain
19
FUTURE…
Machine Learning!
Thanks!
Questions?
20

More Related Content

What's hot

Wcdma Radio Network Planning And Optimization
Wcdma Radio Network Planning And OptimizationWcdma Radio Network Planning And Optimization
Wcdma Radio Network Planning And Optimization
Pengpeng Song
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
Salesforce Engineering
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
confluent
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptxGSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
hussenbelew
 
Performance of Microservice frameworks on different JVMs
Performance of Microservice frameworks on different JVMsPerformance of Microservice frameworks on different JVMs
Performance of Microservice frameworks on different JVMs
Maarten Smeets
 
Network optimization presentation generic dec18
Network optimization presentation generic dec18Network optimization presentation generic dec18
Network optimization presentation generic dec18
frankjoh
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
InfluxData
 
Directed diffusion for wireless sensor networking
Directed diffusion for wireless sensor networkingDirected diffusion for wireless sensor networking
Directed diffusion for wireless sensor networkingHabibur Rahman
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Routing in Wireless Sensor Networks
Routing in Wireless Sensor NetworksRouting in Wireless Sensor Networks
Routing in Wireless Sensor Networks
sashar86
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hetnet ppt
Hetnet pptHetnet ppt
Hetnet ppt
Swapnil Kantale
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdfApache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
dogma28
 

What's hot (20)

Wcdma Radio Network Planning And Optimization
Wcdma Radio Network Planning And OptimizationWcdma Radio Network Planning And Optimization
Wcdma Radio Network Planning And Optimization
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptxGSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
GSM_UMTS_Single_RAN_operation_and_Maintenance_training_Volume_2.pptx
 
Performance of Microservice frameworks on different JVMs
Performance of Microservice frameworks on different JVMsPerformance of Microservice frameworks on different JVMs
Performance of Microservice frameworks on different JVMs
 
Network optimization presentation generic dec18
Network optimization presentation generic dec18Network optimization presentation generic dec18
Network optimization presentation generic dec18
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Directed diffusion for wireless sensor networking
Directed diffusion for wireless sensor networkingDirected diffusion for wireless sensor networking
Directed diffusion for wireless sensor networking
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Routing in Wireless Sensor Networks
Routing in Wireless Sensor NetworksRouting in Wireless Sensor Networks
Routing in Wireless Sensor Networks
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hetnet ppt
Hetnet pptHetnet ppt
Hetnet ppt
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdfApache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
 

Viewers also liked

Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
 
Using MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image AnalysisUsing MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image Analysis
Institute of Information Systems (HES-SO)
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
Rommel Garcia
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
DataWorks Summit
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
A Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopA Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopAkshay Mamulwar
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
Gianvito Siciliano
 
Mild reminder
Mild reminderMild reminder
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Project
 
Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleLiu Liu
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
Pravin Kumar Singh, PMP, PSM
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Cloudera, Inc.
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
Bayes Ahmed
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about ThesisSven Meys
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
DataWorks Summit
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata

Viewers also liked (20)

Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
Using MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image AnalysisUsing MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image Analysis
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
A Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopA Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on Hadoop
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
 
Mild reminder
Mild reminderMild reminder
Mild reminder
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large Scale
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 

Similar to A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
Tugdual Grall
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 
Migrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SEMigrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SE
B1 Systems GmbH
 
Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7
Phase2
 
Ml2
Ml2Ml2
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Sergey Lukjanov
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
Seungdon Choi
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 

Similar to A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics (20)

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Migrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SEMigrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SE
 
Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Ml2
Ml2Ml2
Ml2
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
LucyLuo36
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

  • 2. 2 Image Processing Pipeline ● Acquire Images of Vehicle ● Identify updates/deletes to Images ● Generate unique URL for Images ● Crop and Resize Images ● Copy images to Asset Servers ● Dedupe Images
  • 4. 4 Why Hadoop? ● High Scalability ● Store historical data of Images ● Fault tolerance ● Identify updates to images on basis of content of URL
  • 5. 5 HIP Components 1. HBase: Datastore for Images and archiving Images 2. MapReduce: Computation engine for Image Processor 3. Kafka: Publisher/Subscriber for pushing images to Asset Servers 4. OpenCV Java: Image Processing library 5. Avro: Serialization library for storing data on HDFS
  • 6. 6 HBase Data Model Tables: 1. IMAGE: Store current set of Images with some metadata 2. IMAGE_ARCHIVE: Stores historical data of Vehicles and Original Images
  • 7. 7 Column Family Description Versions I • Store all images of vehicle. • Stores an Image in each Column 1 H • Stores metadata of all Images 1 Table: IMAGE RowKey: <Vin_Number> HBase Data Model Read patterns for “I” and “H” are mutually exclusive
  • 8. 8 Column Family Description Versions I Store original images of vehicle. Only 1 column is stored. 10 A Stores fields of Avro Object of Vehicle and Image for analytics 10 Table: IMAGE_ARCHIVE RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index> HBase Data Model
  • 9. 9 HBase Tuning ● Pre-split tables ● Keep Column names short(2-8 letters) ● Region size 8-10 GB ● Asynchronous clients should buffer(autoFlush=false) Put operations ● Disable periodic Major Compaction
  • 10. Pipeline Dataflow Overview 10 InventoryProcessor Output [Mapper] Parse & Validate Records [Reducer] Identify CRUD Operation Kafka HBase Asset Servers
  • 11. CRUD in Reducer 11 Start Is Deleted? Yes Delete Row in HBase No Is Insert? Yes Download Images Generate 6 Sizes of Image No Get HTTP Headers of ImageURL and Compare with Existing NoHeader Mismatch? Do Nothing Yes 1. Write to HBase 2. Write to Kafka
  • 12. Cascading Downloads 12 One JVM Process Yes [ChainReducer] ImageProcessorReducer NoSocket timeout in 500 milliseconds? No 1. Write to HBase 2. Write to Kafka ImageProcessorMapper ImageProcessorRetryMapper Socket timeout in 5 seconds? Mark URL as “Cannot Process”
  • 13. 1313 Kafka Producer ● One message per Image file ● Producer Message Format: ● Key: ImageFileName (kafka.serializer.StringEncoder) ● Value: Image (kafka.serializer.DefaultEncoder) Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg Value:
  • 14. 14 Kafka Producer Tuning Property Value Default Value request.required.acks 1 0 message.send.max.retries 30 3 retry.backoff.ms 5000 100 client.id HIP “” For Producer, to sustain NODE failure: retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000) Failure recovery in 300ms. Really?
  • 15. Kafka Brokers Tuning Property Value Default Value log.retention.bytes 24 GB -1(unlimited) socket.send.buffer.bytes 10485760 1048576 socket.receive.buffer.bytes 10485760 1048576 1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds. 2. log.retention.bytes = diskspace/number_of_partitions on each node
  • 16. 161616 OpenCV ● Used Java bindings of OpenCV to avoid using Hadoop Streaming ● Java api is quite straight forward to encode, decode, crop and resize. Memory Leak: Mat.release() has to be used to free up memory used by Mat.
  • 17. 17 Performance 0 50 100 150 200 250 300 350 400 3 6 9 12 15 18 H o u r s Images(Millions) HIP ImageProcessor1.0 HIP scales Linearly and at least 10x faster
  • 18. 18 Cascading Downloads 0 2 4 6 8 10 12 14 3 6 9 12 15 18 H o u r s Images(Millions) HIP with Cascading HIP without Cascading 20% performance gain

Editor's Notes

  1. To change OPENING SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “COVER” Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape
  2. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  3. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  4. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  5. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  6. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  7. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  8. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  9. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  10. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  11. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  12. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  13. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  14. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  15. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  16. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  17. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  18. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  19. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  20. To change SECTION BREAK SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “SECTION#0?”. There are 5 “SECTION” master layouts with different background images. Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape