SlideShare a Scribd company logo
Hadoop
ImageProcessing
Pipeline(HIP)
June 10, 2015
Russell Foltz-Smith
Anil Gupta
2
Image Processing Pipeline
● Acquire Images of Vehicle
● Identify updates/deletes to Images
● Generate unique URL for Images
● Crop and Resize Images
● Copy images to Asset Servers
● Dedupe Images
3
Image Processing Pipeline Example
HIP
4
Why Hadoop?
● High Scalability
● Store historical data of Images
● Fault tolerance
● Identify updates to images on basis of content of
URL
5
HIP Components
1. HBase: Datastore for Images and archiving Images
2. MapReduce: Computation engine for Image
Processor
3. Kafka: Publisher/Subscriber for pushing images to
Asset Servers
4. OpenCV Java: Image Processing library
5. Avro: Serialization library for storing data on HDFS
6
HBase Data Model
Tables:
1. IMAGE: Store current set of Images with some metadata
2. IMAGE_ARCHIVE: Stores historical data of Vehicles and
Original Images
7
Column Family Description Versions
I • Store all images of vehicle.
• Stores an Image in each Column
1
H • Stores metadata of all Images 1
Table: IMAGE
RowKey: <Vin_Number>
HBase Data Model
Read patterns for “I” and “H” are mutually exclusive
8
Column Family Description Versions
I Store original images of vehicle.
Only 1 column is stored.
10
A Stores fields of Avro Object of Vehicle
and Image for analytics
10
Table: IMAGE_ARCHIVE
RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index>
HBase Data Model
9
HBase Tuning
● Pre-split tables
● Keep Column names short(2-8 letters)
● Region size 8-10 GB
● Asynchronous clients should buffer(autoFlush=false) Put
operations
● Disable periodic Major Compaction
Pipeline Dataflow Overview
10
InventoryProcessor
Output
[Mapper] Parse &
Validate Records
[Reducer] Identify
CRUD Operation
Kafka
HBase
Asset Servers
CRUD in Reducer
11
Start
Is Deleted?
Yes
Delete Row
in HBase
No
Is Insert?
Yes
Download Images
Generate 6 Sizes
of Image
No Get HTTP Headers of
ImageURL and
Compare with Existing
NoHeader
Mismatch?
Do
Nothing
Yes
1. Write to HBase
2. Write to Kafka
Cascading Downloads
12
One JVM
Process
Yes
[ChainReducer]
ImageProcessorReducer
NoSocket timeout in
500 milliseconds?
No
1. Write to HBase
2. Write to Kafka
ImageProcessorMapper
ImageProcessorRetryMapper
Socket timeout
in 5 seconds?
Mark URL as
“Cannot
Process”
1313
Kafka Producer
● One message per Image file
● Producer Message Format:
● Key: ImageFileName (kafka.serializer.StringEncoder)
● Value: Image (kafka.serializer.DefaultEncoder)
Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg
Value:
14
Kafka Producer Tuning
Property Value Default Value
request.required.acks 1 0
message.send.max.retries 30 3
retry.backoff.ms 5000 100
client.id HIP “”
For Producer, to sustain NODE failure:
retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000)
Failure recovery in
300ms. Really?
Kafka Brokers Tuning
Property Value Default Value
log.retention.bytes 24 GB -1(unlimited)
socket.send.buffer.bytes 10485760 1048576
socket.receive.buffer.bytes 10485760 1048576
1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds.
2. log.retention.bytes = diskspace/number_of_partitions on each node
161616
OpenCV
● Used Java bindings of OpenCV to avoid using Hadoop
Streaming
● Java api is quite straight forward to encode, decode, crop
and resize.
Memory Leak:
Mat.release() has to be used to free up memory used by Mat.
17
Performance
0
50
100
150
200
250
300
350
400
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP
ImageProcessor1.0
HIP scales
Linearly and
at least 10x
faster
18
Cascading Downloads
0
2
4
6
8
10
12
14
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP with Cascading
HIP without Cascading
20%
performance
gain
19
FUTURE…
Machine Learning!
Thanks!
Questions?
20

More Related Content

What's hot

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Kubernetes dealing with storage and persistence
Kubernetes  dealing with storage and persistenceKubernetes  dealing with storage and persistence
Kubernetes dealing with storage and persistence
Janakiram MSV
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
NTT DATA OSS Professional Services
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえり
Toru Makabe
 
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
Etsuji Nakai
 
Actor Model & Reactive Manifesto
Actor Model & Reactive ManifestoActor Model & Reactive Manifesto
Actor Model & Reactive Manifesto
Angelo Simone Scotto
 
Maa goldengate-rac-2007111
Maa goldengate-rac-2007111Maa goldengate-rac-2007111
Maa goldengate-rac-2007111
pablitosax
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13w
Cloudera Japan
 
Tools for Metaspace
Tools for MetaspaceTools for Metaspace
Tools for Metaspace
Takahiro YAMADA
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Helidon 概要
Helidon 概要Helidon 概要
GraalVM
GraalVMGraalVM
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Jean-Philippe PINTE
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
ssuserf8b8bd1
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
Shinji Takao
 

What's hot (20)

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Kubernetes dealing with storage and persistence
Kubernetes  dealing with storage and persistenceKubernetes  dealing with storage and persistence
Kubernetes dealing with storage and persistence
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえり
 
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
10年効く分散ファイルシステム技術 GlusterFS & Red Hat Storage
 
Actor Model & Reactive Manifesto
Actor Model & Reactive ManifestoActor Model & Reactive Manifesto
Actor Model & Reactive Manifesto
 
Maa goldengate-rac-2007111
Maa goldengate-rac-2007111Maa goldengate-rac-2007111
Maa goldengate-rac-2007111
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13w
 
Tools for Metaspace
Tools for MetaspaceTools for Metaspace
Tools for Metaspace
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Helidon 概要
Helidon 概要Helidon 概要
Helidon 概要
 
GraalVM
GraalVMGraalVM
GraalVM
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
GraalVMのJavaネイティブビルド機能でどの程度起動が速くなるのか?~サーバレス基盤上での評価~ / How fast does GraalVM's...
 

Viewers also liked

Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
 
Using MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image AnalysisUsing MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image Analysis
Institute of Information Systems (HES-SO)
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
Rommel Garcia
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
DataWorks Summit
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
A Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopA Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopAkshay Mamulwar
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
Gianvito Siciliano
 
Mild reminder
Mild reminderMild reminder
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Project
 
Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleLiu Liu
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
Pravin Kumar Singh, PMP, PSM
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Cloudera, Inc.
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
Bayes Ahmed
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about ThesisSven Meys
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
DataWorks Summit
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata

Viewers also liked (20)

Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
Using MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image AnalysisUsing MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image Analysis
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
A Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on HadoopA Survey on Medical Image Retrieval Based on Hadoop
A Survey on Medical Image Retrieval Based on Hadoop
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
 
Mild reminder
Mild reminderMild reminder
Mild reminder
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large Scale
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 

Similar to A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
Tugdual Grall
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 
Migrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SEMigrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SE
B1 Systems GmbH
 
Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7
Phase2
 
Ml2
Ml2Ml2
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Sergey Lukjanov
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
Seungdon Choi
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 

Similar to A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics (20)

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Migrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SEMigrating deployment processes and Continuous Integration at SAP SE
Migrating deployment processes and Continuous Integration at SAP SE
 
Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7Taking your site from Drupal 6 to Drupal 7
Taking your site from Drupal 6 to Drupal 7
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Ml2
Ml2Ml2
Ml2
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

  • 2. 2 Image Processing Pipeline ● Acquire Images of Vehicle ● Identify updates/deletes to Images ● Generate unique URL for Images ● Crop and Resize Images ● Copy images to Asset Servers ● Dedupe Images
  • 4. 4 Why Hadoop? ● High Scalability ● Store historical data of Images ● Fault tolerance ● Identify updates to images on basis of content of URL
  • 5. 5 HIP Components 1. HBase: Datastore for Images and archiving Images 2. MapReduce: Computation engine for Image Processor 3. Kafka: Publisher/Subscriber for pushing images to Asset Servers 4. OpenCV Java: Image Processing library 5. Avro: Serialization library for storing data on HDFS
  • 6. 6 HBase Data Model Tables: 1. IMAGE: Store current set of Images with some metadata 2. IMAGE_ARCHIVE: Stores historical data of Vehicles and Original Images
  • 7. 7 Column Family Description Versions I • Store all images of vehicle. • Stores an Image in each Column 1 H • Stores metadata of all Images 1 Table: IMAGE RowKey: <Vin_Number> HBase Data Model Read patterns for “I” and “H” are mutually exclusive
  • 8. 8 Column Family Description Versions I Store original images of vehicle. Only 1 column is stored. 10 A Stores fields of Avro Object of Vehicle and Image for analytics 10 Table: IMAGE_ARCHIVE RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index> HBase Data Model
  • 9. 9 HBase Tuning ● Pre-split tables ● Keep Column names short(2-8 letters) ● Region size 8-10 GB ● Asynchronous clients should buffer(autoFlush=false) Put operations ● Disable periodic Major Compaction
  • 10. Pipeline Dataflow Overview 10 InventoryProcessor Output [Mapper] Parse & Validate Records [Reducer] Identify CRUD Operation Kafka HBase Asset Servers
  • 11. CRUD in Reducer 11 Start Is Deleted? Yes Delete Row in HBase No Is Insert? Yes Download Images Generate 6 Sizes of Image No Get HTTP Headers of ImageURL and Compare with Existing NoHeader Mismatch? Do Nothing Yes 1. Write to HBase 2. Write to Kafka
  • 12. Cascading Downloads 12 One JVM Process Yes [ChainReducer] ImageProcessorReducer NoSocket timeout in 500 milliseconds? No 1. Write to HBase 2. Write to Kafka ImageProcessorMapper ImageProcessorRetryMapper Socket timeout in 5 seconds? Mark URL as “Cannot Process”
  • 13. 1313 Kafka Producer ● One message per Image file ● Producer Message Format: ● Key: ImageFileName (kafka.serializer.StringEncoder) ● Value: Image (kafka.serializer.DefaultEncoder) Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg Value:
  • 14. 14 Kafka Producer Tuning Property Value Default Value request.required.acks 1 0 message.send.max.retries 30 3 retry.backoff.ms 5000 100 client.id HIP “” For Producer, to sustain NODE failure: retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000) Failure recovery in 300ms. Really?
  • 15. Kafka Brokers Tuning Property Value Default Value log.retention.bytes 24 GB -1(unlimited) socket.send.buffer.bytes 10485760 1048576 socket.receive.buffer.bytes 10485760 1048576 1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds. 2. log.retention.bytes = diskspace/number_of_partitions on each node
  • 16. 161616 OpenCV ● Used Java bindings of OpenCV to avoid using Hadoop Streaming ● Java api is quite straight forward to encode, decode, crop and resize. Memory Leak: Mat.release() has to be used to free up memory used by Mat.
  • 17. 17 Performance 0 50 100 150 200 250 300 350 400 3 6 9 12 15 18 H o u r s Images(Millions) HIP ImageProcessor1.0 HIP scales Linearly and at least 10x faster
  • 18. 18 Cascading Downloads 0 2 4 6 8 10 12 14 3 6 9 12 15 18 H o u r s Images(Millions) HIP with Cascading HIP without Cascading 20% performance gain

Editor's Notes

  1. To change OPENING SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “COVER” Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape
  2. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  3. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  4. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  5. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  6. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  7. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  8. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  9. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  10. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  11. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  12. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  13. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  14. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  15. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  16. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  17. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  18. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  19. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  20. To change SECTION BREAK SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “SECTION#0?”. There are 5 “SECTION” master layouts with different background images. Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape