SlideShare a Scribd company logo
1 of 15
Download to read offline
HPCC Systems 
Loading csv Data 
& 
Querying 
By Fujio Turner 
@FujioTurner
Non-Indexed Full Data Set 
1 20 
Customers Development Business 
http://hpccsystems.com/why-hpcc/benchmarks
ECL (Enterprise Control Language) 
C++ based query language 
SQL w/ JOINS 
Map/Reduce 
GraphDB 
Machine 
Learning 
Simple to Complex Queries
“I’m sub-second 
fast.” 
“I can query all 
or part of your 
data.” 
Architecture 
Thor Roxie 
Hard Disk 
Index(optional) 
Hard Disk 
Index(optional) 
In-memory Index 
SSD 
Either/Both
Example 
File Load File into HPCC Query 
CSV data sample source 
http://catalog.data.gov/dataset/consumer-complaint-database
Administrator Web GUI! 
on 
IP / Url of HPCC install Port 8010
4. add ,t 
5. 
1. Upload file*! 
2. Distribute to cluster! 
3. Name of file in cluster! 
4. Most CSV have t! 
5. Push to cluster 
*2GB file size limit through web 
No limit if uploaded via SOAP 
Load !! ! ! Data
*optional file rename Loaded 
In Thor Cluster
How do I Query HPCC Systems ? 
What Is ECL? 
ECL (Enterprise Control Language) is a C++ based query 
language for use with HPCC Systems Big Data platform. 
ECLs syntax and format is very simple and easy to learn.! 
! 
Note - ECL is very similar to Hadoop’s pig ,but! 
more expressive and feature rich.
Query w/ ECL 
Com := DATASET(‘~test::complaints’,ComS, 
CSV(HEADING(1), SEPARATOR([',','t']))); 
ComS :=RECORD 
UNSIGNED3 ComplaintID; 
STRING23 Product; 
STRING38 State; 
…………………………. 
…………………………. 
STRING31Consumer_disputed; 
END; 
Ma := Com(State = ‘MA’); 
Ma; //output 
WHERE `State` = ‘MA’ 
File Type 
File Location,! 
“FROM Table” 
“USE DATABASE;” 
“SELECT * ….” 
Schema
1. Go to playground! 
2. Edit ECL! 
3. Pick “thor” Cluster! 
4. Submit 
Practice 
http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__- 
_CSV_LOAD_and_QUERY
Schema Made EZ 
http://hpccsystems.com/demos/data-profiling-demo 
CSV 
IN 
Schema 
Click OUT 
Storing a new file and want to make a quick schema? 
! 
Take a small part of your CSV data and 
go to the link below to make an ECL Schema
ECL Guide 
http://hpccsystems.com/download/docs/ecl-language-reference 
JOIN! 
MERGE! 
LENGTH! 
REGEX! 
ROUND! 
SUM! 
COUNT! 
TRIM! 
WHEN! 
AVE! 
ABS! 
CASE! 
DEDUP! 
NORMALIZE! 
DENORMALIZE! 
IF! 
SORT! 
GROUP! 
more ….
For More HPCC! 
“How To’s”! 
Go to SlideShare 
http://www.slideshare.net/FujioTurner/
Watch how to install 
HPCC Systems 
in 5 Minutes 
Download HPCC Systems 
Open Source 
Community Edition 
http://hpccsystems.com/download/ 
http://www.youtube.com/watch?v=8SV43DCUqJg 
or 
Source Code 
https://github.com/hpcc-systems

More Related Content

What's hot

Redis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaRedis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaItamar Haber
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationlin bao
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertablehypertable
 
Hypertable
HypertableHypertable
Hypertablebetaisao
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsChien Chung Shen
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learnedTit Petric
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open houseJulien Le Dem
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016Duyhai Doan
 

What's hot (20)

Redis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaRedis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of Lua
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
 
Hypertable
HypertableHypertable
Hypertable
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learned
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 

Similar to Big Data - Load CSV File & Query the EZ way - HPCC Systems

Runtime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreRuntime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreEsha Yadav
 
Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9sumsid1234
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14Sri Ambati
 
Accelerated data access
Accelerated data accessAccelerated data access
Accelerated data accessgordonyorke
 
Advanced SQL Injection
Advanced SQL InjectionAdvanced SQL Injection
Advanced SQL Injectionamiable_indian
 
Sql Injection Adv Owasp
Sql Injection Adv OwaspSql Injection Adv Owasp
Sql Injection Adv OwaspAung Khant
 
OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL Suraj Bang
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 
20090109 Dsl2cpp Md Workbench
20090109 Dsl2cpp Md Workbench20090109 Dsl2cpp Md Workbench
20090109 Dsl2cpp Md Workbenchazubi
 
"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercisesrICh morrow
 
Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부Hyun-Mook Choi
 
C5 c++ development environment
C5 c++ development environmentC5 c++ development environment
C5 c++ development environmentsnchnchl
 
Exam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationExam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationKylieJonathan
 
Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008guestd9065
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1Protect724mouni
 

Similar to Big Data - Load CSV File & Query the EZ way - HPCC Systems (20)

Runtime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreRuntime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya Rathore
 
Database driven web pages
Database driven web pagesDatabase driven web pages
Database driven web pages
 
Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9Hacking Oracle From Web Apps 1 9
Hacking Oracle From Web Apps 1 9
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
Accelerated data access
Accelerated data accessAccelerated data access
Accelerated data access
 
Advanced SQL Injection
Advanced SQL InjectionAdvanced SQL Injection
Advanced SQL Injection
 
Sql Injection Adv Owasp
Sql Injection Adv OwaspSql Injection Adv Owasp
Sql Injection Adv Owasp
 
OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
Ch23 system administration
Ch23 system administration Ch23 system administration
Ch23 system administration
 
htogcp.docx
htogcp.docxhtogcp.docx
htogcp.docx
 
20090109 Dsl2cpp Md Workbench
20090109 Dsl2cpp Md Workbench20090109 Dsl2cpp Md Workbench
20090109 Dsl2cpp Md Workbench
 
"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises
 
Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
C5 c++ development environment
C5 c++ development environmentC5 c++ development environment
C5 c++ development environment
 
Exam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationExam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and Administration
 
Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1
Migrating ESM Resources From Oracle to CORR-Engine for ESM 6.5c SP1
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Big Data - Load CSV File & Query the EZ way - HPCC Systems

  • 1. HPCC Systems Loading csv Data & Querying By Fujio Turner @FujioTurner
  • 2. Non-Indexed Full Data Set 1 20 Customers Development Business http://hpccsystems.com/why-hpcc/benchmarks
  • 3. ECL (Enterprise Control Language) C++ based query language SQL w/ JOINS Map/Reduce GraphDB Machine Learning Simple to Complex Queries
  • 4. “I’m sub-second fast.” “I can query all or part of your data.” Architecture Thor Roxie Hard Disk Index(optional) Hard Disk Index(optional) In-memory Index SSD Either/Both
  • 5. Example File Load File into HPCC Query CSV data sample source http://catalog.data.gov/dataset/consumer-complaint-database
  • 6. Administrator Web GUI! on IP / Url of HPCC install Port 8010
  • 7. 4. add ,t 5. 1. Upload file*! 2. Distribute to cluster! 3. Name of file in cluster! 4. Most CSV have t! 5. Push to cluster *2GB file size limit through web No limit if uploaded via SOAP Load !! ! ! Data
  • 8. *optional file rename Loaded In Thor Cluster
  • 9. How do I Query HPCC Systems ? What Is ECL? ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.! ! Note - ECL is very similar to Hadoop’s pig ,but! more expressive and feature rich.
  • 10. Query w/ ECL Com := DATASET(‘~test::complaints’,ComS, CSV(HEADING(1), SEPARATOR([',','t']))); ComS :=RECORD UNSIGNED3 ComplaintID; STRING23 Product; STRING38 State; …………………………. …………………………. STRING31Consumer_disputed; END; Ma := Com(State = ‘MA’); Ma; //output WHERE `State` = ‘MA’ File Type File Location,! “FROM Table” “USE DATABASE;” “SELECT * ….” Schema
  • 11. 1. Go to playground! 2. Edit ECL! 3. Pick “thor” Cluster! 4. Submit Practice http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__- _CSV_LOAD_and_QUERY
  • 12. Schema Made EZ http://hpccsystems.com/demos/data-profiling-demo CSV IN Schema Click OUT Storing a new file and want to make a quick schema? ! Take a small part of your CSV data and go to the link below to make an ECL Schema
  • 13. ECL Guide http://hpccsystems.com/download/docs/ecl-language-reference JOIN! MERGE! LENGTH! REGEX! ROUND! SUM! COUNT! TRIM! WHEN! AVE! ABS! CASE! DEDUP! NORMALIZE! DENORMALIZE! IF! SORT! GROUP! more ….
  • 14. For More HPCC! “How To’s”! Go to SlideShare http://www.slideshare.net/FujioTurner/
  • 15. Watch how to install HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition http://hpccsystems.com/download/ http://www.youtube.com/watch?v=8SV43DCUqJg or Source Code https://github.com/hpcc-systems