SlideShare a Scribd company logo
1 of 29
Download to read offline
HPCC Systems - Big Data! 
! 
Roxie - 
In-Memory Data & Index , 
Sub-Second Query Cluster 
By Fujio Turner 
@FujioTurner
Who is ? What is HPCC Systems? 
LexisNexis is a provider of legal, 
tax, regulatory, news, business 
information, and analysis to 
legal, corporate, government,! 
accounting and academic 
markets. ! 
! 
LexisNexis has been in 
business since 1977 with over 
30,000 employees worldwide. 
LexisNexis Risk is the division 
of the LexisNexis which focuses 
on data, Big Data processing, 
linking and vertical expertise 
and supports HPCC Systems 
as an open source project 
under Apache 2.0 License.
Comparison 
Block Based File Based 
JAVA C++ 
Petabytes 
1-80,000 Jobs/day 
Since 2005 
Exabytes 
Non-Indexed 4X-13X 
Indexed: 2K-3K Jobs/sec 
Since 2000 
? ? ? ? ? ? 
Thor Roxie
Non-Indexed Full Data Set 
1 20 
Customers Development Business 
http://hpccsystems.com/why-hpcc/benchmarks
How do I Query HPCC Systems? 
ECL (Enterprise Control Language) is a C++ based query 
language for use with HPCC Systems Big Data platform. 
ECLs syntax and format is very simple and easy to learn.! 
! 
Note - ECL is very similar to Hadoopā€™s pig ,but! 
more expressive and feature rich.
ECL (Enterprise Control Language) 
C++ based query language 
SQL w/ JOINS 
Map/Reduce 
GraphDB 
Machine 
Learning 
Simple to Complex Queries
ā€œIā€™m sub-second 
fast.ā€ 
ā€œI can query all 
or part of your 
data.ā€ 
Cluster Architecture 
Thor Roxie 
Hard Disk 
Index(optional) 
Hard Disk 
Index(optional) 
In-memory Index 
SSD 
Either/Both 
Rapid Online XML Inquiry Engine
Roxie 
Date 2004 
Query 
Languages ECL 
or 
In-Memory Index Only 
Index w/ Part or All Data 
Data Type Normalized 
Query 
Methods 
or DeNormalized 
REST Direct TCP 
or Unstructured 
SOAP
Example 1 
File Load Into Thor Query 
VS 
File Load Into Roxie Index Publish Query
HPCC Systems Sample Data for Examples 1 
Sample Data 
http://hpccsystems.com/download/docs/learning-ecl
Administrator Web GUI! 
on 
IP / Url of HPCC install Port 8010
4. 
5. 
1. Upload file*! 
2. Distribute to cluster! 
3. Name of file in cluster! 
4. Size of each row! 
5. Push to cluster 
*2GB file size limit through web 
No limit if uploaded via SOAP 
Load Data
Loaded 
In Thor Cluster
Query ! 
Example 1 
Data 
Layout_People := RECORD 
STRING15 FirstName; 
STRING25 LastName; 
STRING15 MiddleName; 
STRING5 Zip; 
STRING42 Street; 
STRING20 City; 
STRING2 State; 
END; 
Schema 
allPeople := DATASET(ā€˜~test::originalpersonā€™,Layout_Person,THOR); 
WHERE `LastName` = ā€˜Smithā€™ 
smith := allPeople(LastName= ā€˜Smithā€™); 
smith; //Output 
Query 
File Location,! 
ā€œUSE DATABASE;ā€ ā€œFROM Tableā€ 
ā€œSELECT * ā€¦.ā€
Copy Data ! 
From Thor to Roxie
1.Select Thor File(s) 2.Copy Tab! 
! 
3. Pick Roxie cluster! 
! 
4. Click Copy! 
!
Indexing! 
In-Memory 
Ex. Creating an index by ā€œLastNameā€ 
allPeople := DATASET(ā€˜~test::originalperson_copyā€™, 
{Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}}, 
File Position Number! 
pseudo recordID! 
ā€œAlter Tableā€(new column) 
Make Index 
Index Filename 
FLAT); 
rx := INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); 
BUILDINDEX(rx); 
In-Memory
Indexing! 
In-Memory with Luggage 
Index Only 
rx := INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); 
Index + Part or All Data 
rx := INDEX(allPeople,{FirstName,MiddleName},{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); 
+ 
Store Data! 
In-Memory! 
with Index
Query 
allPeople := DATASET(ā€˜~test::originalperson_copyā€™, {Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}},FLAT); 
datax:= INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™); 
WHERE `LastName` = ā€˜Smithā€™ from Index 
filterdata; //Output 
w/ Index 
Data 
Query 
filterdata:= FETCH(allPeople,datax(LastName=ā€˜Smithā€™),RIGHT. RecPtr);
ā€œIā€™m sub-second 
fast.ā€ Publish Your Code 
What Is publishing your code? 
ECL is built from C++. So your ECL needs to be 
compiled before it runs.When you publish a 
query, you pre-compile your ECL and send it to 
the ESP server where it will be stored. ESP, on 
port 8002 , will listen to any requests and execute 
the published query. 
Can I do ad-hoc queries without publishing? 
Yes, but it will not be sub-second fast as it is not 
pre-compiled.
How Querying ESP Works
How to Publish Your ECL 
1.Select your ad-hoc ā€œlastnameā€ query from before 
2.Name you query 
ā€œlastnameā€ 
3. Publish
ESP on port 8002 
Enterprise Service Platform 
1.Select ā€œlastnameā€ query 
2.Select your data output format 
ā€Output JSONā€ 3.Click Submit button
JSON Format 
1.Send Request = Query or hit this Url
JSON Format 
Results - in less then a second
2013-06-08 
2013-06-07 
Twitter 
2013-06-06 
Twitter 
Twitter 
Use this file 
name to get 
all the real files 
2013-06 
Twitter 
2013-06-06 
ā€¦ā€¦ā€¦.. -07 
ā€¦ā€¦ā€¦.. -08 
Logical File 
Real File 
SuperFile! 
Organizing Your Files 
+ Append new real files on the fly
2013-06-08 
2013-06-07 
Twitter 
2013-06-06 
Twitter 
Twitter 
SuperKeys No Sub-Super 
2013 
Twitter 
Files or Keys 
in Roxie 
Organizing Your Indexes
For More HPCC! 
ā€œHow Toā€™sā€! 
Go to SlideShare 
http://www.slideshare.net/FujioTurner/
Watch how to install 
HPCC Systems 
in 5 Minutes 
Download HPCC Systems 
Open Source 
Community Edition 
http://hpccsystems.com/download/ 
http://www.youtube.com/watch?v=8SV43DCUqJg 
or 
Source Code 
https://github.com/hpcc-systems

More Related Content

What's hot

SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for ElasticsearchJodok Batlogg
Ā 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learnedTit Petric
Ā 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationlin bao
Ā 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
Ā 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Ā 
Redis 101 Data Structure
Redis 101 Data StructureRedis 101 Data Structure
Redis 101 Data StructureIsmaeel Enjreny
Ā 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
Ā 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
Ā 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6Yash Mody
Ā 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemKoushik Mondal
Ā 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
Ā 
EKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsEKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsRuben Taelman
Ā 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertablehypertable
Ā 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
Ā 
Hypertable
HypertableHypertable
Hypertablebetaisao
Ā 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
Ā 
Use Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysUse Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysItamar Haber
Ā 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
Ā 

What's hot (20)

SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for Elasticsearch
Ā 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learned
Ā 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
Ā 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
Ā 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Ā 
Redis 101 Data Structure
Redis 101 Data StructureRedis 101 Data Structure
Redis 101 Data Structure
Ā 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Ā 
Hadoop
HadoopHadoop
Hadoop
Ā 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Ā 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6
Ā 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Ā 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Ā 
EKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsEKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern Fragments
Ā 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
Ā 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
Ā 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
Ā 
Hypertable
HypertableHypertable
Hypertable
Ā 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
Ā 
Use Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysUse Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual Ways
Ā 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Ā 

Viewers also liked

åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†
åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†
åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†Jacky Chi
Ā 
Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)lyonjug
Ā 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Armel Nene
Ā 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)saravana krishnamurthy
Ā 
S2GX 2012 - Introduction to Spring Integration and Spring Batch
S2GX 2012 - Introduction to Spring Integration and Spring BatchS2GX 2012 - Introduction to Spring Integration and Spring Batch
S2GX 2012 - Introduction to Spring Integration and Spring BatchGunnar Hillert
Ā 
Parallel batch processing with spring batch slideshare
Parallel batch processing with spring batch   slideshareParallel batch processing with spring batch   slideshare
Parallel batch processing with spring batch slideshareMorten Andersen-Gott
Ā 
Ahea Team Spring batch
Ahea Team Spring batchAhea Team Spring batch
Ahea Team Spring batchSunghyun Roh
Ā 

Viewers also liked (7)

åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†
åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†
åŸŗäŗŽSpring batchēš„å¤§ę•°ę®é‡å¹¶č”Œå¤„ē†
Ā 
Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)
Ā 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Ā 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)
Ā 
S2GX 2012 - Introduction to Spring Integration and Spring Batch
S2GX 2012 - Introduction to Spring Integration and Spring BatchS2GX 2012 - Introduction to Spring Integration and Spring Batch
S2GX 2012 - Introduction to Spring Integration and Spring Batch
Ā 
Parallel batch processing with spring batch slideshare
Parallel batch processing with spring batch   slideshareParallel batch processing with spring batch   slideshare
Parallel batch processing with spring batch slideshare
Ā 
Ahea Team Spring batch
Ahea Team Spring batchAhea Team Spring batch
Ahea Team Spring batch
Ā 

Similar to Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems

maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingMax Kleiner
Ā 
Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9sumsid1234
Ā 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream csching
Ā 
Networking and communications security ā€“ network architecture design
Networking and communications security ā€“ network architecture designNetworking and communications security ā€“ network architecture design
Networking and communications security ā€“ network architecture designEnterpriseGRC Solutions, Inc.
Ā 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayArik Weinstein
Ā 
Strategies to design FUD malware
Strategies to design FUD malwareStrategies to design FUD malware
Strategies to design FUD malwarePedro Tavares
Ā 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk
Ā 
Case study ap log collector
Case study ap log collectorCase study ap log collector
Case study ap log collectorJyun-Yao Huang
Ā 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
Ā 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real timeRohit Kalsarpe
Ā 
TechWiseTV Workshop: Catalyst Switching Programmability
TechWiseTV Workshop: Catalyst Switching ProgrammabilityTechWiseTV Workshop: Catalyst Switching Programmability
TechWiseTV Workshop: Catalyst Switching ProgrammabilityRobb Boyd
Ā 
Automated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposAutomated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposPriyanka Aash
Ā 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsRod Soto
Ā 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
Ā 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
Ā 
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...ZFConf Conference
Ā 
MUM Middle East 2016 - System Integration Analyst
MUM Middle East 2016 - System Integration AnalystMUM Middle East 2016 - System Integration Analyst
MUM Middle East 2016 - System Integration AnalystFajar Nugroho
Ā 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
Ā 
OWASP Portland - OWASP Top 10 For JavaScript Developers
OWASP Portland - OWASP Top 10 For JavaScript DevelopersOWASP Portland - OWASP Top 10 For JavaScript Developers
OWASP Portland - OWASP Top 10 For JavaScript DevelopersLewis Ardern
Ā 

Similar to Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems (20)

maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage coding
Ā 
Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9
Ā 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
Ā 
Networking and communications security ā€“ network architecture design
Networking and communications security ā€“ network architecture designNetworking and communications security ā€“ network architecture design
Networking and communications security ā€“ network architecture design
Ā 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
Ā 
Strategies to design FUD malware
Strategies to design FUD malwareStrategies to design FUD malware
Strategies to design FUD malware
Ā 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
Ā 
Case study ap log collector
Case study ap log collectorCase study ap log collector
Case study ap log collector
Ā 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
Ā 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real time
Ā 
TechWiseTV Workshop: Catalyst Switching Programmability
TechWiseTV Workshop: Catalyst Switching ProgrammabilityTechWiseTV Workshop: Catalyst Switching Programmability
TechWiseTV Workshop: Catalyst Switching Programmability
Ā 
Automated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposAutomated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gpos
Ā 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Ā 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Ā 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
Ā 
2015 moloch recipes
2015 moloch recipes2015 moloch recipes
2015 moloch recipes
Ā 
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...
ZFConf 2011: Š§Ń‚Š¾ тŠ°ŠŗŠ¾Šµ Sphinx, Š·Š°Ń‡ŠµŠ¼ Š¾Š½ Š²Š¾Š¾Š±Ń‰Šµ Š½ŃƒŠ¶ŠµŠ½ Šø ŠŗŠ°Šŗ ŠµŠ³Š¾ ŠøсŠæŠ¾Š»ŃŒŠ·Š¾Š²Š°Ń‚ŃŒ с...
Ā 
MUM Middle East 2016 - System Integration Analyst
MUM Middle East 2016 - System Integration AnalystMUM Middle East 2016 - System Integration Analyst
MUM Middle East 2016 - System Integration Analyst
Ā 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
Ā 
OWASP Portland - OWASP Top 10 For JavaScript Developers
OWASP Portland - OWASP Top 10 For JavaScript DevelopersOWASP Portland - OWASP Top 10 For JavaScript Developers
OWASP Portland - OWASP Top 10 For JavaScript Developers
Ā 

Recently uploaded

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
Ā 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Ā 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Ā 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Ā 

Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems

  • 1. HPCC Systems - Big Data! ! Roxie - In-Memory Data & Index , Sub-Second Query Cluster By Fujio Turner @FujioTurner
  • 2. Who is ? What is HPCC Systems? LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,! accounting and academic markets. ! ! LexisNexis has been in business since 1977 with over 30,000 employees worldwide. LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
  • 3. Comparison Block Based File Based JAVA C++ Petabytes 1-80,000 Jobs/day Since 2005 Exabytes Non-Indexed 4X-13X Indexed: 2K-3K Jobs/sec Since 2000 ? ? ? ? ? ? Thor Roxie
  • 4. Non-Indexed Full Data Set 1 20 Customers Development Business http://hpccsystems.com/why-hpcc/benchmarks
  • 5. How do I Query HPCC Systems? ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.! ! Note - ECL is very similar to Hadoopā€™s pig ,but! more expressive and feature rich.
  • 6. ECL (Enterprise Control Language) C++ based query language SQL w/ JOINS Map/Reduce GraphDB Machine Learning Simple to Complex Queries
  • 7. ā€œIā€™m sub-second fast.ā€ ā€œI can query all or part of your data.ā€ Cluster Architecture Thor Roxie Hard Disk Index(optional) Hard Disk Index(optional) In-memory Index SSD Either/Both Rapid Online XML Inquiry Engine
  • 8. Roxie Date 2004 Query Languages ECL or In-Memory Index Only Index w/ Part or All Data Data Type Normalized Query Methods or DeNormalized REST Direct TCP or Unstructured SOAP
  • 9. Example 1 File Load Into Thor Query VS File Load Into Roxie Index Publish Query
  • 10. HPCC Systems Sample Data for Examples 1 Sample Data http://hpccsystems.com/download/docs/learning-ecl
  • 11. Administrator Web GUI! on IP / Url of HPCC install Port 8010
  • 12. 4. 5. 1. Upload file*! 2. Distribute to cluster! 3. Name of file in cluster! 4. Size of each row! 5. Push to cluster *2GB file size limit through web No limit if uploaded via SOAP Load Data
  • 13. Loaded In Thor Cluster
  • 14. Query ! Example 1 Data Layout_People := RECORD STRING15 FirstName; STRING25 LastName; STRING15 MiddleName; STRING5 Zip; STRING42 Street; STRING20 City; STRING2 State; END; Schema allPeople := DATASET(ā€˜~test::originalpersonā€™,Layout_Person,THOR); WHERE `LastName` = ā€˜Smithā€™ smith := allPeople(LastName= ā€˜Smithā€™); smith; //Output Query File Location,! ā€œUSE DATABASE;ā€ ā€œFROM Tableā€ ā€œSELECT * ā€¦.ā€
  • 15. Copy Data ! From Thor to Roxie
  • 16. 1.Select Thor File(s) 2.Copy Tab! ! 3. Pick Roxie cluster! ! 4. Click Copy! !
  • 17. Indexing! In-Memory Ex. Creating an index by ā€œLastNameā€ allPeople := DATASET(ā€˜~test::originalperson_copyā€™, {Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}}, File Position Number! pseudo recordID! ā€œAlter Tableā€(new column) Make Index Index Filename FLAT); rx := INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); BUILDINDEX(rx); In-Memory
  • 18. Indexing! In-Memory with Luggage Index Only rx := INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); Index + Part or All Data rx := INDEX(allPeople,{FirstName,MiddleName},{LastName,RecPtr},ā€™~test::key_person_copyā€™,PRELOAD); + Store Data! In-Memory! with Index
  • 19. Query allPeople := DATASET(ā€˜~test::originalperson_copyā€™, {Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}},FLAT); datax:= INDEX(allPeople,{LastName,RecPtr},ā€™~test::key_person_copyā€™); WHERE `LastName` = ā€˜Smithā€™ from Index filterdata; //Output w/ Index Data Query filterdata:= FETCH(allPeople,datax(LastName=ā€˜Smithā€™),RIGHT. RecPtr);
  • 20. ā€œIā€™m sub-second fast.ā€ Publish Your Code What Is publishing your code? ECL is built from C++. So your ECL needs to be compiled before it runs.When you publish a query, you pre-compile your ECL and send it to the ESP server where it will be stored. ESP, on port 8002 , will listen to any requests and execute the published query. Can I do ad-hoc queries without publishing? Yes, but it will not be sub-second fast as it is not pre-compiled.
  • 22. How to Publish Your ECL 1.Select your ad-hoc ā€œlastnameā€ query from before 2.Name you query ā€œlastnameā€ 3. Publish
  • 23. ESP on port 8002 Enterprise Service Platform 1.Select ā€œlastnameā€ query 2.Select your data output format ā€Output JSONā€ 3.Click Submit button
  • 24. JSON Format 1.Send Request = Query or hit this Url
  • 25. JSON Format Results - in less then a second
  • 26. 2013-06-08 2013-06-07 Twitter 2013-06-06 Twitter Twitter Use this file name to get all the real files 2013-06 Twitter 2013-06-06 ā€¦ā€¦ā€¦.. -07 ā€¦ā€¦ā€¦.. -08 Logical File Real File SuperFile! Organizing Your Files + Append new real files on the fly
  • 27. 2013-06-08 2013-06-07 Twitter 2013-06-06 Twitter Twitter SuperKeys No Sub-Super 2013 Twitter Files or Keys in Roxie Organizing Your Indexes
  • 28. For More HPCC! ā€œHow Toā€™sā€! Go to SlideShare http://www.slideshare.net/FujioTurner/
  • 29. Watch how to install HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition http://hpccsystems.com/download/ http://www.youtube.com/watch?v=8SV43DCUqJg or Source Code https://github.com/hpcc-systems