SlideShare a Scribd company logo
1 of 21
Download to read offline
1
• We are building innovative advertising management platforms to assist our customers to get
smarter decisions, reach their business goals faster and better in real time.
• We are proud to have the most cutting edge products and lead the performance and video online
advertising market while striving to build long-term relationships with our clients and partners.
• Our intention is to simplify the complexity existing in the ad-tech industry and provide our
customers with the ability to earn more revenues while using our products and services.
• Edge is led by a team of industry veterans, with Offices in NY, Tel-Aviv and Beijing, and employs
over 100 team members.
2
• Data & BI Team Leader at Edge
• Experienced in wide range of RDBMS technologies
• Working with Hadoop since 2014
• Certified Cloudera Administrator and trainer
• Oracle Certified Professional
3
• Big Data at Edge
• Our Goals
• About Impala
• High Overview
• Why We Chose to Work with Impala
• Our Challenges
• Our Setup
• Designing Impala Tables
4
5
6
• Deliver insights on data in real time
Fraud Detection
Time-series Analysis
Predictive analytics
Interactive exploratory analytics on our data sets
• Provide a convenient way to interact with the data
• Continuously load batches of data, and make them visible with
minimal delay.
• Handle high number of concurrent users
7
• Cloudera's open source massively parallel processing (MPP)
SQL query engine
• Runs on Hadoop clusters
• 100% open source, released under the Apache Software
license
8
• Does not rely on a general purpose
data processing engine such as
MapReduce
• Executes queries directly on the
Hadoop cluster
• Well-suited for executing interactive
analytics queries on large data sets
• Tables are really directories of files
in HDFS
9
• Impala Servers run on each node of a cluster.
• The Impala State Store Server is responsible
for confirming which nodes are healthy and
can accept new work
• The Catalog Server (new in CDH5) is
responsible for sending the new
metadata to all other Impala
nodes
• You can submit a query to the Impala Server running on any node
10
• Supports A large set of SQL statements, including SELECT and
INSERT, JOIN, Subqueries, and SQL Analytic Functions.
• Highly compatible with HiveQL
• Using Cloudera Manager, Impala services can deployed and
managed
• Allows the usage of Hue for queries.
• Impala is certified to run against Tableau
11
• Querying data stored in HDFS (provides a distributed,
high-performance queries)
• Each Impala daemon can handle multiple concurrent client
requests
• Impala is pioneering the use of the Parquet file format, a
columnar storage layout that is optimized for large-scale queries
typical in data warehouse scenarios.
12
• Allows the usage of partitioning
• By default, all the data files for a table are located in a
single directory. Partitioning is a technique for physically
dividing the data during loading, based on values from one
or more columns.
• Impala is a widely adopted standard across the ecosystem,
with many users and extensive documentation
13
• Hadoop Cluster
o Cluster sizing
o Workload testing (query throughput and and response time)
• Database Design
o Identify access pattern based on real use cases
o Make sure we’re not generating too many partitions
o Make sure the data in each partition is large enough
o Design our “Star Schema” data warehouse
• Data Types
o Data consistency across Pig, Hive, and Impala
• File formats
• Tune queries
14
15
● Year = 2017
○ Month = 03
■ Day = 01
■ Day = 02
■ Day = 03
■ …
○ Month = 04
■ Day = 01
■ Day = 02
■ ...
16
• Although we use a “Star Schema” design in Impala. There are a
lot of architectural differences between our Impala layout and
the old RDBMS system.
• Keep that in mind and avoid using your existing RDBMS data
storage and processing strategies in Impala
17
18
19
• Instead of using MapReduce, Impala reads the HDFS data
directly
• Impala allows users to query data in HDFS using an SQL-like
language
• The administrative tasks related to Impala are greatly simplified
by Cloudera Manager
20
• Quickly get started with Cloudera using a preconfigured VM or a
Docker Image
• Impala Frequently Asked Questions
• More details on Apache Parquet
• The Impala Cookbook
21

More Related Content

What's hot

In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBaseFelipe Ferreira
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latencyhyeongchae lee
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackJustin Swanhart
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Executing Queries on a Sharded Database
Executing Queries on a Sharded DatabaseExecuting Queries on a Sharded Database
Executing Queries on a Sharded DatabaseNeha Narula
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Ashnikbiz
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryJustin Swanhart
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsSteve Knutson
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server sideHoward Marks
 
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...EDB
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs FasterBob Ward
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBCalpont
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLPGConf APAC
 
Hello World with EDB Postgres
Hello World with EDB PostgresHello World with EDB Postgres
Hello World with EDB PostgresEDB
 

What's hot (20)

In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latency
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stack
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
Executing Queries on a Sharded Database
Executing Queries on a Sharded DatabaseExecuting Queries on a Sharded Database
Executing Queries on a Sharded Database
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard query
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
Sql over hadoop ver 3
Sql over hadoop ver 3Sql over hadoop ver 3
Sql over hadoop ver 3
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Column db dol
Column db dolColumn db dol
Column db dol
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Hello World with EDB Postgres
Hello World with EDB PostgresHello World with EDB Postgres
Hello World with EDB Postgres
 

Similar to Impala use case @ edge

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 

Similar to Impala use case @ edge (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 

More from Ram Kedem

Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database InstanceRam Kedem
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power ViewRam Kedem
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSASRam Kedem
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSASRam Kedem
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - OracleRam Kedem
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS AttributesRam Kedem
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)Ram Kedem
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)Ram Kedem
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Ram Kedem
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Ram Kedem
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML dataRam Kedem
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesRam Kedem
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic ParametersRam Kedem
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional FormattingRam Kedem
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated FieldsRam Kedem
 

More from Ram Kedem (20)

Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database Instance
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power View
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSAS
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSAS
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - Oracle
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS Attributes
 
SSRS Matrix
SSRS MatrixSSRS Matrix
SSRS Matrix
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML data
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic Parameters
 
SSRS Gauges
SSRS GaugesSSRS Gauges
SSRS Gauges
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional Formatting
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated Fields
 
SSRS Groups
SSRS GroupsSSRS Groups
SSRS Groups
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Impala use case @ edge

  • 1. 1
  • 2. • We are building innovative advertising management platforms to assist our customers to get smarter decisions, reach their business goals faster and better in real time. • We are proud to have the most cutting edge products and lead the performance and video online advertising market while striving to build long-term relationships with our clients and partners. • Our intention is to simplify the complexity existing in the ad-tech industry and provide our customers with the ability to earn more revenues while using our products and services. • Edge is led by a team of industry veterans, with Offices in NY, Tel-Aviv and Beijing, and employs over 100 team members. 2
  • 3. • Data & BI Team Leader at Edge • Experienced in wide range of RDBMS technologies • Working with Hadoop since 2014 • Certified Cloudera Administrator and trainer • Oracle Certified Professional 3
  • 4. • Big Data at Edge • Our Goals • About Impala • High Overview • Why We Chose to Work with Impala • Our Challenges • Our Setup • Designing Impala Tables 4
  • 5. 5
  • 6. 6 • Deliver insights on data in real time Fraud Detection Time-series Analysis Predictive analytics Interactive exploratory analytics on our data sets • Provide a convenient way to interact with the data • Continuously load batches of data, and make them visible with minimal delay. • Handle high number of concurrent users
  • 7. 7 • Cloudera's open source massively parallel processing (MPP) SQL query engine • Runs on Hadoop clusters • 100% open source, released under the Apache Software license
  • 8. 8 • Does not rely on a general purpose data processing engine such as MapReduce • Executes queries directly on the Hadoop cluster • Well-suited for executing interactive analytics queries on large data sets • Tables are really directories of files in HDFS
  • 9. 9 • Impala Servers run on each node of a cluster. • The Impala State Store Server is responsible for confirming which nodes are healthy and can accept new work • The Catalog Server (new in CDH5) is responsible for sending the new metadata to all other Impala nodes • You can submit a query to the Impala Server running on any node
  • 10. 10 • Supports A large set of SQL statements, including SELECT and INSERT, JOIN, Subqueries, and SQL Analytic Functions. • Highly compatible with HiveQL • Using Cloudera Manager, Impala services can deployed and managed • Allows the usage of Hue for queries. • Impala is certified to run against Tableau
  • 11. 11 • Querying data stored in HDFS (provides a distributed, high-performance queries) • Each Impala daemon can handle multiple concurrent client requests • Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for large-scale queries typical in data warehouse scenarios.
  • 12. 12 • Allows the usage of partitioning • By default, all the data files for a table are located in a single directory. Partitioning is a technique for physically dividing the data during loading, based on values from one or more columns. • Impala is a widely adopted standard across the ecosystem, with many users and extensive documentation
  • 13. 13 • Hadoop Cluster o Cluster sizing o Workload testing (query throughput and and response time) • Database Design o Identify access pattern based on real use cases o Make sure we’re not generating too many partitions o Make sure the data in each partition is large enough o Design our “Star Schema” data warehouse • Data Types o Data consistency across Pig, Hive, and Impala • File formats • Tune queries
  • 14. 14
  • 15. 15 ● Year = 2017 ○ Month = 03 ■ Day = 01 ■ Day = 02 ■ Day = 03 ■ … ○ Month = 04 ■ Day = 01 ■ Day = 02 ■ ...
  • 16. 16 • Although we use a “Star Schema” design in Impala. There are a lot of architectural differences between our Impala layout and the old RDBMS system. • Keep that in mind and avoid using your existing RDBMS data storage and processing strategies in Impala
  • 17. 17
  • 18. 18
  • 19. 19 • Instead of using MapReduce, Impala reads the HDFS data directly • Impala allows users to query data in HDFS using an SQL-like language • The administrative tasks related to Impala are greatly simplified by Cloudera Manager
  • 20. 20 • Quickly get started with Cloudera using a preconfigured VM or a Docker Image • Impala Frequently Asked Questions • More details on Apache Parquet • The Impala Cookbook
  • 21. 21