SlideShare a Scribd company logo
Ten things
to consider
for Interactive Analytics on high
volume, write-once workloads
Full talk and demo at Fifth Elephant 2014
Abinash Karan
abinash@Bizosys.com
www.bizosys.com
About
• CTO and Co-Founder at Bizosys Technologies since 2009
• Created HSearch – a Real-time, distributed search and
analytics engine built on Hadoop platform
• Passion on distributed systems and data structures
• Speaker at Fifth Elephant 2013, Microsoft Teched 2012,
Yahoo Hadoop India Summit 2011
• Developed partitioning, read optimized data structures
modules for HSearch.
• Worked with a range of search products including Lucene,
Solr, Endeca and FAST
• Abinash is an engineering graduate of NIT, Raurkela
Summary of what you will hear
CONTEXT – Write once data load - Ex. Time-series data.
Which Database?
1. SSD is Good
2. MPP is Good
3. Columnar is Good
4. Logical Partition is Good
5. Data Skew Partition is Good
6. Search Engine Index could lead to Index Explosion
7. Concurrent Users First, Single Query Performance Next
8. High Throughput File level Snapshot Loading
9. Calculate cost upfront
10. Data Structure makes a Big Difference
HBase
MangoDB
Shark
SAP Hanna
i1010
Which Database?
HSearch
Riak
Hive
Dremel
1010data
Memcached
FoundationDB
Splunk
Elasticsearch
DynamoDB
Datameer
LevelDB
Netezza
Oracle TimesTen
Aerospike
Sybase IQ
Vertica
accumulo
HyperTable
SOLR
Data Node
Application
Server
DB
Instance
Network
50 micro
sec
DISK
Disk access
20 milli sec
SSD
100 micro sec
RAM
100 nano sec
Data Node
Application
Server
Database
Node
Network
50 micro
sec
DISK
Data Hotness
based caching
Concept#1 SSD And RAM is Good.
SSD
RAM
Database
Node
Application
Server
MPP Node
Computed
Data
DISK
All Data
MPP Processing ?Concept#2 MPP is Good
12 2 2 8 4
12
228 bytes
Concept#3 Columnar is Good
Opens 84 Bytes*Filter on Col1 and Display Col6
2012 Data
180 Millions
…..
2014 Data
500 Millions
Select sum(col3) where col2= 2014
Complete Dataset
(1 billion rows)
Partitioned Data
(500M Rows)
Concept#4 Logical Partition is Good
Stringer
5 Million
…
5 Million
500 Million
rows in
memory
Select sum(col3) where col2= 2014
5 Million rows
in memory
Concept#5 Data Skew Partition is Good (Paging)
2012 Data
180 Millions
…..
2014 Data
500 Millions
Index size is X times more
of original data size
Index size is X time lesser
of original data size
Concept#6 Search Index may lead to Index Explosion
Repeated Value
Unique Value
1 2 2 2 8 4
1 2 2 2 8 4
Concept#7 Concurrent Users First, Single Query Performance
Next
1 User
10% CPU
200ms
1 User
70% CPU
175ms
Support 6
Concurrent
Users
Concept#8 High Throughput File level Snapshot Loading
Insert 1 row in 1sec
1million rows in 1sec
Insert 1 row in 1 ms
1million rows in 1
hour
Backup
Move the
snapshot file
Distributed Index
Building
Splitting
Compaction
Concept#9 Calculate cost upfront
Support existing
SQLs,
No new servers
New Process
Instance
New Language
No Monitoring
Hardware Cost Per Byte
SSD-RAM,
Engine Efficiency,
Spot Instance – Reserved Instance,
Indexes @ Compute Node - Data Node
Maintenance Cost
Skill Acquisition, Dashboard
App Dev/Migration Cost
Existing SQLs to custom SQL/JSON
CSV/JSON/
TSV
KV
Secondary
Index
Inverted
Index
LazySorted
Binary
Serde
Append
Update
Delete
GET
Select (Repea
t Data)
(Non-Repeat
Data)
Filter (Repe
at
Data)
(Non-
Repeat
Data)
Nulls
Concept#10 Data Structure makes a Big Difference
* Custom Variations : RC File, ORC File, Parquet
1. Size Reduction
on Index
2. Compressibility
3. Fast Access
10 CONCEPT DEMONSTRATION
HSEARCH DEMO
HVAC ID BuildingID READING_TIME INLET
TEMP
OUTLET
TEMP
ERROR
MESSAGE

More Related Content

What's hot

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Sri Kanth
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
Imply
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
Dremio Corporation
 
Solr on Cloud
Solr on CloudSolr on Cloud
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
DoiT International
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Chris Schalk
 
Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J
Ranveer Tegi
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
dhruv_gairola
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
Rasel Rana
 
Hadoop
HadoopHadoop
Big Data
Big DataBig Data
Big Data
Amir Hossain
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .
abhinavbom
 
Jethro qlik-datasheet
Jethro qlik-datasheetJethro qlik-datasheet
Jethro qlik-datasheet
Venkatesan Ethiraj
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
Imply
 
Cloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's NewCloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's New
Cloudian
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
How to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 stepsHow to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 steps
Luca Naso
 

What's hot (20)

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Solr on Cloud
Solr on CloudSolr on Cloud
Solr on Cloud
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J Learn to Build Apps Using Neo4J
Learn to Build Apps Using Neo4J
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .
 
Jethro qlik-datasheet
Jethro qlik-datasheetJethro qlik-datasheet
Jethro qlik-datasheet
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Cloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's NewCloudian HyperStore 5.0 Release What's New
Cloudian HyperStore 5.0 Release What's New
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
How to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 stepsHow to plan a successful Digital Signage Campaign in 5 steps
How to plan a successful Digital Signage Campaign in 5 steps
 

Viewers also liked

Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephantAbinasha Karana
 
Introduction to ActOnMagic
Introduction to ActOnMagicIntroduction to ActOnMagic
Introduction to ActOnMagic
Madan Ganesh Velayudham
 
Europäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu AusteritätEuropäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu Austerität
Eric Bonse
 
2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-childrenMichael Wright
 
ใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมWanwipha Kanjan
 
Интерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутациейИнтерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутацией
MoscowCCI
 
Iot platform supporting million requests per second
Iot platform supporting million requests per secondIot platform supporting million requests per second
Iot platform supporting million requests per second
Abinasha Karana
 

Viewers also liked (7)

Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephant
 
Introduction to ActOnMagic
Introduction to ActOnMagicIntroduction to ActOnMagic
Introduction to ActOnMagic
 
Europäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu AusteritätEuropäische Mediendiskurse zu Austerität
Europäische Mediendiskurse zu Austerität
 
2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children2013 05-29-advocates-for-blind-children
2013 05-29-advocates-for-blind-children
 
ใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีมใบงานสำรวจตนเองครีม
ใบงานสำรวจตนเองครีม
 
Интерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутациейИнтерактивный маркетинг: Управление репутацией
Интерактивный маркетинг: Управление репутацией
 
Iot platform supporting million requests per second
Iot platform supporting million requests per secondIot platform supporting million requests per second
Iot platform supporting million requests per second
 

Similar to Ten things to consider for interactive analytics on write once workloads

Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointINFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
Jonathan Ralton
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
Praveen Kumar (Tyagi)
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
Amazon Web Services
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
Amazon Web Services
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
Max De Marzi
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint
Joel Oleson
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
Suman Srinivasan
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
Dr. Haxel Consult
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
Frank Kienle
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
MS Cloud Summit
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
Joe_F
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
Amazon Web Services
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database Overview
Neo4j
 
BigData Analysis
BigData AnalysisBigData Analysis

Similar to Ten things to consider for interactive analytics on write once workloads (20)

Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointINFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database Overview
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 

Recently uploaded

Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 

Recently uploaded (20)

Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 

Ten things to consider for interactive analytics on write once workloads

  • 1. Ten things to consider for Interactive Analytics on high volume, write-once workloads Full talk and demo at Fifth Elephant 2014 Abinash Karan abinash@Bizosys.com www.bizosys.com
  • 2. About • CTO and Co-Founder at Bizosys Technologies since 2009 • Created HSearch – a Real-time, distributed search and analytics engine built on Hadoop platform • Passion on distributed systems and data structures • Speaker at Fifth Elephant 2013, Microsoft Teched 2012, Yahoo Hadoop India Summit 2011 • Developed partitioning, read optimized data structures modules for HSearch. • Worked with a range of search products including Lucene, Solr, Endeca and FAST • Abinash is an engineering graduate of NIT, Raurkela
  • 3. Summary of what you will hear CONTEXT – Write once data load - Ex. Time-series data. Which Database? 1. SSD is Good 2. MPP is Good 3. Columnar is Good 4. Logical Partition is Good 5. Data Skew Partition is Good 6. Search Engine Index could lead to Index Explosion 7. Concurrent Users First, Single Query Performance Next 8. High Throughput File level Snapshot Loading 9. Calculate cost upfront 10. Data Structure makes a Big Difference
  • 5. Data Node Application Server DB Instance Network 50 micro sec DISK Disk access 20 milli sec SSD 100 micro sec RAM 100 nano sec Data Node Application Server Database Node Network 50 micro sec DISK Data Hotness based caching Concept#1 SSD And RAM is Good.
  • 7. 12 2 2 8 4 12 228 bytes Concept#3 Columnar is Good Opens 84 Bytes*Filter on Col1 and Display Col6
  • 8. 2012 Data 180 Millions ….. 2014 Data 500 Millions Select sum(col3) where col2= 2014 Complete Dataset (1 billion rows) Partitioned Data (500M Rows) Concept#4 Logical Partition is Good Stringer
  • 9. 5 Million … 5 Million 500 Million rows in memory Select sum(col3) where col2= 2014 5 Million rows in memory Concept#5 Data Skew Partition is Good (Paging) 2012 Data 180 Millions ….. 2014 Data 500 Millions
  • 10. Index size is X times more of original data size Index size is X time lesser of original data size Concept#6 Search Index may lead to Index Explosion Repeated Value Unique Value 1 2 2 2 8 4 1 2 2 2 8 4
  • 11. Concept#7 Concurrent Users First, Single Query Performance Next 1 User 10% CPU 200ms 1 User 70% CPU 175ms Support 6 Concurrent Users
  • 12. Concept#8 High Throughput File level Snapshot Loading Insert 1 row in 1sec 1million rows in 1sec Insert 1 row in 1 ms 1million rows in 1 hour Backup Move the snapshot file Distributed Index Building Splitting Compaction
  • 13. Concept#9 Calculate cost upfront Support existing SQLs, No new servers New Process Instance New Language No Monitoring Hardware Cost Per Byte SSD-RAM, Engine Efficiency, Spot Instance – Reserved Instance, Indexes @ Compute Node - Data Node Maintenance Cost Skill Acquisition, Dashboard App Dev/Migration Cost Existing SQLs to custom SQL/JSON
  • 14. CSV/JSON/ TSV KV Secondary Index Inverted Index LazySorted Binary Serde Append Update Delete GET Select (Repea t Data) (Non-Repeat Data) Filter (Repe at Data) (Non- Repeat Data) Nulls Concept#10 Data Structure makes a Big Difference * Custom Variations : RC File, ORC File, Parquet 1. Size Reduction on Index 2. Compressibility 3. Fast Access
  • 15. 10 CONCEPT DEMONSTRATION HSEARCH DEMO HVAC ID BuildingID READING_TIME INLET TEMP OUTLET TEMP ERROR MESSAGE