SlideShare a Scribd company logo
1 of 15
Ten things
to consider
for Interactive Analytics on high
volume, write-once workloads
Full talk and demo at Fifth Elephant 2014
Abinash Karan
abinash@Bizosys.com
www.bizosys.com
About
• CTO and Co-Founder at Bizosys Technologies since 2009
• Created HSearch – a Real-time, distributed search and
analytics engine built on Hadoop platform
• Passion on distributed systems and data structures
• Speaker at Fifth Elephant 2013, Microsoft Teched 2012,
Yahoo Hadoop India Summit 2011
• Developed partitioning, read optimized data structures
modules for HSearch.
• Worked with a range of search products including Lucene,
Solr, Endeca and FAST
• Abinash is an engineering graduate of NIT, Raurkela
Summary of what you will hear
CONTEXT – Write once data load - Ex. Time-series data.
Which Database?
1. SSD is Good
2. MPP is Good
3. Columnar is Good
4. Logical Partition is Good
5. Data Skew Partition is Good
6. Search Engine Index could lead to Index Explosion
7. Concurrent Users First, Single Query Performance Next
8. High Throughput File level Snapshot Loading
9. Calculate cost upfront
10. Data Structure makes a Big Difference
HBase
MangoDB
Shark
SAP Hanna
i1010
Which Database?
HSearch
Riak
Hive
Dremel
1010data
Memcached
FoundationDB
Splunk
Elasticsearch
DynamoDB
Datameer
LevelDB
Netezza
Oracle TimesTen
Aerospike
Sybase IQ
Vertica
accumulo
HyperTable
SOLR
Data Node
Application
Server
DB
Instance
Network
50 micro
sec
DISK
Disk access
20 milli sec
SSD
100 micro sec
RAM
100 nano sec
Data Node
Application
Server
Database
Node
Network
50 micro
sec
DISK
Data Hotness
based caching
Concept#1 SSD And RAM is Good.
SSD
RAM
Database
Node
Application
Server
MPP Node
Computed
Data
DISK
All Data
MPP Processing ?Concept#2 MPP is Good
12 2 2 8 4
12
228 bytes
Concept#3 Columnar is Good
Opens 84 Bytes*Filter on Col1 and Display Col6
2012 Data
180 Millions
…..
2014 Data
500 Millions
Select sum(col3) where col2= 2014
Complete Dataset
(1 billion rows)
Partitioned Data
(500M Rows)
Concept#4 Logical Partition is Good
Stringer
5 Million
…
5 Million
500 Million
rows in
memory
Select sum(col3) where col2= 2014
5 Million rows
in memory
Concept#5 Data Skew Partition is Good (Paging)
2012 Data
180 Millions
…..
2014 Data
500 Millions
Index size is X times more
of original data size
Index size is X time lesser
of original data size
Concept#6 Search Index may lead to Index Explosion
Repeated Value
Unique Value
1 2 2 2 8 4
1 2 2 2 8 4
Concept#7 Concurrent Users First, Single Query Performance
Next
1 User
10% CPU
200ms
1 User
70% CPU
175ms
Support 6
Concurrent
Users
Concept#8 High Throughput File level Snapshot Loading
Insert 1 row in 1sec
1million rows in 1sec
Insert 1 row in 1 ms
1million rows in 1
hour
Backup
Move the
snapshot file
Distributed Index
Building
Splitting
Compaction
Concept#9 Calculate cost upfront
Support existing
SQLs,
No new servers
New Process
Instance
New Language
No Monitoring
Hardware Cost Per Byte
SSD-RAM,
Engine Efficiency,
Spot Instance – Reserved Instance,
Indexes @ Compute Node - Data Node
Maintenance Cost
Skill Acquisition, Dashboard
App Dev/Migration Cost
Existing SQLs to custom SQL/JSON
CSV/JSON/
TSV
KV
Secondary
Index
Inverted
Index
LazySorted
Binary
Serde
Append
Update
Delete
GET
Select (Repea
t Data)
(Non-Repeat
Data)
Filter (Repe
at
Data)
(Non-
Repeat
Data)
Nulls
Concept#10 Data Structure makes a Big Difference
* Custom Variations : RC File, ORC File, Parquet
1. Size Reduction
on Index
2. Compressibility
3. Fast Access
10 CONCEPT DEMONSTRATION
HSEARCH DEMO
HVAC ID BuildingID READING_TIME INLET
TEMP
OUTLET
TEMP
ERROR
MESSAGE

More Related Content

What's hot

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopSri Kanth
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewDoiT International
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J Ranveer Tegi
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopdhruv_gairola
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarRasel Rana
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .abhinavbom
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
Cloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's NewCloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's NewCloudian
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
How to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 stepsHow to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 stepsLuca Naso
 

What's hot (20)

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Solr on Cloud
Solr on CloudSolr on Cloud
Solr on Cloud
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .
 
Jethro qlik-datasheet
Jethro qlik-datasheetJethro qlik-datasheet
Jethro qlik-datasheet
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Cloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's NewCloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's New
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
How to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 stepsHow to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 steps
 

Viewers also liked

Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephantAbinasha Karana
 
Europäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu AusteritätEuropäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu AusteritätEric Bonse
 
2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-childrenMichael Wright
 
ใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมWanwipha Kanjan
 
Интерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутациейИнтерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутациейMoscowCCI
 
Iot platform supporting million requests per second
Iot platform supporting million requests per secondIot platform supporting million requests per second
Iot platform supporting million requests per secondAbinasha Karana
 

Viewers also liked (7)

Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephant
 
Introduction to ActOnMagic
Introduction to ActOnMagicIntroduction to ActOnMagic
Introduction to ActOnMagic
 
Europäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu AusteritätEuropäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu Austerität
 
2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children
 
ใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีม
 
Интерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутациейИнтерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутацией
 
Iot platform supporting million requests per second
Iot platform supporting million requests per secondIot platform supporting million requests per second
Iot platform supporting million requests per second
 

Similar to Ten things to consider for interactive analytics on write once workloads

Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Ian Gomez
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at HikeBharvi Dixit
 
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointINFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointJonathan Ralton
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud EcosystemAmazon Web Services
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
 
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePointJoel Oleson
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial IntelligenceSuman Srinivasan
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellDr. Haxel Consult
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewNeo4j
 

Similar to Ten things to consider for interactive analytics on write once workloads (20)

Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointINFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database Overview
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 

Recently uploaded

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Ten things to consider for interactive analytics on write once workloads

  • 1. Ten things to consider for Interactive Analytics on high volume, write-once workloads Full talk and demo at Fifth Elephant 2014 Abinash Karan abinash@Bizosys.com www.bizosys.com
  • 2. About • CTO and Co-Founder at Bizosys Technologies since 2009 • Created HSearch – a Real-time, distributed search and analytics engine built on Hadoop platform • Passion on distributed systems and data structures • Speaker at Fifth Elephant 2013, Microsoft Teched 2012, Yahoo Hadoop India Summit 2011 • Developed partitioning, read optimized data structures modules for HSearch. • Worked with a range of search products including Lucene, Solr, Endeca and FAST • Abinash is an engineering graduate of NIT, Raurkela
  • 3. Summary of what you will hear CONTEXT – Write once data load - Ex. Time-series data. Which Database? 1. SSD is Good 2. MPP is Good 3. Columnar is Good 4. Logical Partition is Good 5. Data Skew Partition is Good 6. Search Engine Index could lead to Index Explosion 7. Concurrent Users First, Single Query Performance Next 8. High Throughput File level Snapshot Loading 9. Calculate cost upfront 10. Data Structure makes a Big Difference
  • 5. Data Node Application Server DB Instance Network 50 micro sec DISK Disk access 20 milli sec SSD 100 micro sec RAM 100 nano sec Data Node Application Server Database Node Network 50 micro sec DISK Data Hotness based caching Concept#1 SSD And RAM is Good.
  • 7. 12 2 2 8 4 12 228 bytes Concept#3 Columnar is Good Opens 84 Bytes*Filter on Col1 and Display Col6
  • 8. 2012 Data 180 Millions ….. 2014 Data 500 Millions Select sum(col3) where col2= 2014 Complete Dataset (1 billion rows) Partitioned Data (500M Rows) Concept#4 Logical Partition is Good Stringer
  • 9. 5 Million … 5 Million 500 Million rows in memory Select sum(col3) where col2= 2014 5 Million rows in memory Concept#5 Data Skew Partition is Good (Paging) 2012 Data 180 Millions ….. 2014 Data 500 Millions
  • 10. Index size is X times more of original data size Index size is X time lesser of original data size Concept#6 Search Index may lead to Index Explosion Repeated Value Unique Value 1 2 2 2 8 4 1 2 2 2 8 4
  • 11. Concept#7 Concurrent Users First, Single Query Performance Next 1 User 10% CPU 200ms 1 User 70% CPU 175ms Support 6 Concurrent Users
  • 12. Concept#8 High Throughput File level Snapshot Loading Insert 1 row in 1sec 1million rows in 1sec Insert 1 row in 1 ms 1million rows in 1 hour Backup Move the snapshot file Distributed Index Building Splitting Compaction
  • 13. Concept#9 Calculate cost upfront Support existing SQLs, No new servers New Process Instance New Language No Monitoring Hardware Cost Per Byte SSD-RAM, Engine Efficiency, Spot Instance – Reserved Instance, Indexes @ Compute Node - Data Node Maintenance Cost Skill Acquisition, Dashboard App Dev/Migration Cost Existing SQLs to custom SQL/JSON
  • 14. CSV/JSON/ TSV KV Secondary Index Inverted Index LazySorted Binary Serde Append Update Delete GET Select (Repea t Data) (Non-Repeat Data) Filter (Repe at Data) (Non- Repeat Data) Nulls Concept#10 Data Structure makes a Big Difference * Custom Variations : RC File, ORC File, Parquet 1. Size Reduction on Index 2. Compressibility 3. Fast Access
  • 15. 10 CONCEPT DEMONSTRATION HSEARCH DEMO HVAC ID BuildingID READING_TIME INLET TEMP OUTLET TEMP ERROR MESSAGE