SlideShare a Scribd company logo
1 of 35
Download to read offline
Breadth
                    or Depth
What's in a column-store?


February 23, 2013           Jeff Smith
This presentation
Is not         Is
   marketing        persuasive
   technical        for the technical
   arbitrary        precise
   polite           opinionated
   training         educational
Srsouly
Bio
{ past :[startups, biotech, data_management],
school : [research, HKU, uncertain_data],
work : [AI, finance, prediction] }
This guy




Daniel Abadi
Back to the future
● 1 database to rule them all
● A scrappy band of rebels
● A brave new idea
The big question
Why grab this?
id    thing    attr1   attr2   attr3   attr4   attr5   attr6   attr7   attr8

123   doodad   abc     def     ghi     jkl     mno     pqr     stu     vwx


When all you want is this?
id    thing

123   doodad
You're chopping it wrong.
Relations in pieces
 id    pet     weight   poops_per_day

 1     dog     40       3

 2     cat     15       2

 3     bird    5        4

 4     snake   78       0.25
Horizontal Partitions
 id     pet     weight   poops_per_day

 1      dog     40       3


 2      cat     15       2


 3      bird    5        4


 4      snake   78       0.25
You gotta get yourself some marble columns.
Vertical Partitions
 id   pet     weight   poops_per_day

 1    dog     40       3

 2    cat     15       2

 3    bird    5        4

 4    snake   78       0.25
We're gonna need a bigger table.
BigTable
NoSQL starts
Empire crumbles
Nomenclature obfuscates
I know that song!
Column...families?!
 Pets                                          Cars


 row_id   best_pet   worst_pet   illegal_pet   row_id   make    model

 123      bulldog    turtle      rhino         123      Smart   Fortwo
Modest Map
Year of the snake =>   Year of Python
4G =>                  LTE
NoSQL =>               Non-relational
Beard =>               Face-mane
Column-stores =>       {column-store |
                       column-family-store}
Does it smell as sweet?
C-Store rocks*
...at column-oriented tasks.




               * Contrary to popular belief, after years
               of effort, Cleveland still does not rock.
Move, b*tch.
Get out the vote.
 age

 23

 32

 45

 67

 56

 49

 43

 50

 63

 34
The catch
Attack of the clones
The contenders
HBase*
Cassandra*
Hypertable
Accumulo




             * The ones that matter
HBase
Hadoop stack
Java everywhere
Components,
extensions, variables,
headaches...
Tastes like SQL
SELECT sensorid, (20-down)/(up-down) AS
probability
FROM hive_sensors WHERE down>=10 AND
up>=20 and down <=20
UNION ALL
SELECT sensorid, (up-10)/(up-down) AS
probability
FROM hive_sensors WHERE up>=10 AND up<=20
and down <=10
UNION ALL
SELECT sensorid, 1 AS probability
FROM hive_sensors WHERE up<=20 and down
>=10
UNION ALL
SELECT sensorid, (20-10)/(up-down) AS
probability
FROM hive_sensors WHERE down<=10 AND
up>=20;
Cassandra
CQL interface
Peer to peer
Better, but...
Anything you can do, I can do better.
Sparseness
id   attr1   attr2   attr3   attr4

1    1

2                            1

3            1

4                    1

5

6            1

7

8                            1

9    1

10

11
Dynamic Schemas
Pets                                                      Cars


row_id   best_pet   worst_pet   illegal_pet   robot_pet   row_id   make    model

123      bulldog    turtle      rhino         aibo        123      Smart   Fortwo

456      shi tzu    gecko       koala                     456      VW      Golf
Stronger in the broken places
Innovation
Truly distributed systems
Columns as metadata
Arbitrarily deep column hierarchies*
Community database development




                       * Someday soon, I hope
Pig & friends
data = load 'hbase://table_name' using
org.apache.pig.backend.hadoop.hbase.
HBaseStorage( 'cf1:*', '-loadKey true' )
AS (id:chararray, stats: map[int]);

@outputSchema ("values:bag{t:tuple(key,
value)}")
def bag_of_tuples (map_dict):
    return map_dict.items()

register 'udfs.py' using jython as py
data = load 'hbase://table_name' using
org.apache.pig.backend.hadoop.hbase.
HBaseStorage( 'cf1:*', '-loadKey true' )
AS (id:chararray, stats: map[int]);
databag = foreach data generate id,
FLATTEN(py.bag_of_tuples(stats));

                                           from Chase Seibert
No dog in this fight
Hey I just met you
  And this is crazy
  But here's my email
  Mail me maybe



Work                                         Play

jeff@aidyia.com         jeffreyksmithjr@gmail.com
All images used in this presentation were stolen
from the internet in a daring midnight raid that
left 3 dead and 8 wounded. No license was
obtained for their use and no license is implied
by their misappropriation.

Yarrr. BarrrCamp.

Please don't sue me. I have nothing. Just a
dog. Don't take my dog.




Disclaimer

More Related Content

Viewers also liked

Webb keynote
Webb keynoteWebb keynote
Webb keynote
DASD
 
Physical development of infants and toddlerhood
Physical development of infants and toddlerhoodPhysical development of infants and toddlerhood
Physical development of infants and toddlerhood
Naomi Gimena
 
Project Based Learning Ppt
Project Based Learning PptProject Based Learning Ppt
Project Based Learning Ppt
ragogli
 
Dimensions and principles of curriculum design
Dimensions and principles of curriculum designDimensions and principles of curriculum design
Dimensions and principles of curriculum design
Jay Cee
 
Principles & theories in curriculum development ppt
Principles & theories in curriculum development pptPrinciples & theories in curriculum development ppt
Principles & theories in curriculum development ppt
chxlabastilla
 

Viewers also liked (20)

Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
 
Rigor & Relevance Presentation
Rigor & Relevance PresentationRigor & Relevance Presentation
Rigor & Relevance Presentation
 
Reactive Machine Learning and Functional Programming
Reactive Machine Learning and Functional ProgrammingReactive Machine Learning and Functional Programming
Reactive Machine Learning and Functional Programming
 
Borders Progression Workshop
Borders Progression WorkshopBorders Progression Workshop
Borders Progression Workshop
 
Gender Budgeting & Relevance of Indicators
Gender Budgeting & Relevance of IndicatorsGender Budgeting & Relevance of Indicators
Gender Budgeting & Relevance of Indicators
 
Webb keynote
Webb keynoteWebb keynote
Webb keynote
 
The Relevance of Child-Spacing on the Academic Performance of Married Women i...
The Relevance of Child-Spacing on the Academic Performance of Married Women i...The Relevance of Child-Spacing on the Academic Performance of Married Women i...
The Relevance of Child-Spacing on the Academic Performance of Married Women i...
 
Bologna toc 2013 changing world of children's books final
Bologna toc 2013 changing world of children's books finalBologna toc 2013 changing world of children's books final
Bologna toc 2013 changing world of children's books final
 
Reactive Machine Learning On and Beyond the JVM
Reactive Machine Learning On and Beyond the JVMReactive Machine Learning On and Beyond the JVM
Reactive Machine Learning On and Beyond the JVM
 
RELEVANCE OF SPECIAL EDUCATION TO THE TEAACHER
RELEVANCE OF SPECIAL EDUCATION TO THE TEAACHERRELEVANCE OF SPECIAL EDUCATION TO THE TEAACHER
RELEVANCE OF SPECIAL EDUCATION TO THE TEAACHER
 
Principles of Curriculum Design
Principles of Curriculum DesignPrinciples of Curriculum Design
Principles of Curriculum Design
 
Gifted KIds NCAGT
Gifted KIds NCAGTGifted KIds NCAGT
Gifted KIds NCAGT
 
Physical development of infants and toddlerhood
Physical development of infants and toddlerhoodPhysical development of infants and toddlerhood
Physical development of infants and toddlerhood
 
Project Based Learning Ppt
Project Based Learning PptProject Based Learning Ppt
Project Based Learning Ppt
 
Dimensions and principles of curriculum design
Dimensions and principles of curriculum designDimensions and principles of curriculum design
Dimensions and principles of curriculum design
 
Relevance Theory
Relevance TheoryRelevance Theory
Relevance Theory
 
Principles & theories in curriculum development ppt
Principles & theories in curriculum development pptPrinciples & theories in curriculum development ppt
Principles & theories in curriculum development ppt
 
Personalized Learning Chart v3
Personalized Learning Chart v3 Personalized Learning Chart v3
Personalized Learning Chart v3
 
Lesson Plan
Lesson PlanLesson Plan
Lesson Plan
 
Personalization vs. Differentiation vs. Individualization Report-v3
Personalization vs. Differentiation vs. Individualization Report-v3Personalization vs. Differentiation vs. Individualization Report-v3
Personalization vs. Differentiation vs. Individualization Report-v3
 

Similar to Breadth or Depth: What's in a column-store?

Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Joe Stein
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
Brian Johnson
 
how to hack with pack and unpack
how to hack with pack and unpackhow to hack with pack and unpack
how to hack with pack and unpack
David Lowe
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
DataStax
 

Similar to Breadth or Depth: What's in a column-store? (20)

Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
20151020 Metis
20151020 Metis20151020 Metis
20151020 Metis
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010
 
ppt
pptppt
ppt
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
Analyzing Log Data With Apache Spark
Analyzing Log Data With Apache SparkAnalyzing Log Data With Apache Spark
Analyzing Log Data With Apache Spark
 
State Space Search
State Space SearchState Space Search
State Space Search
 
Design Fundamentals
Design FundamentalsDesign Fundamentals
Design Fundamentals
 
how to hack with pack and unpack
how to hack with pack and unpackhow to hack with pack and unpack
how to hack with pack and unpack
 
03 introduction to graph databases
03   introduction to graph databases03   introduction to graph databases
03 introduction to graph databases
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
 
Code with style
Code with styleCode with style
Code with style
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Beyond shuffling - Strata London 2016
Beyond shuffling - Strata London 2016Beyond shuffling - Strata London 2016
Beyond shuffling - Strata London 2016
 

More from Jeff Smith

More from Jeff Smith (7)

Questioning Conversational AI
Questioning Conversational AIQuestioning Conversational AI
Questioning Conversational AI
 
Neuroevolution in Elixir
Neuroevolution in ElixirNeuroevolution in Elixir
Neuroevolution in Elixir
 
Tools for Making Machine Learning more Reactive
Tools for Making Machine Learning more ReactiveTools for Making Machine Learning more Reactive
Tools for Making Machine Learning more Reactive
 
Building Learning Agents
Building Learning AgentsBuilding Learning Agents
Building Learning Agents
 
Reactive for Machine Learning Teams
Reactive for Machine Learning TeamsReactive for Machine Learning Teams
Reactive for Machine Learning Teams
 
Collecting Uncertain Data the Reactive Way
Collecting Uncertain Data the Reactive WayCollecting Uncertain Data the Reactive Way
Collecting Uncertain Data the Reactive Way
 
Save the server, Save the world
Save the server, Save the worldSave the server, Save the world
Save the server, Save the world
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Breadth or Depth: What's in a column-store?

  • 1. Breadth or Depth What's in a column-store? February 23, 2013 Jeff Smith
  • 2. This presentation Is not Is marketing persuasive technical for the technical arbitrary precise polite opinionated training educational
  • 4. Bio { past :[startups, biotech, data_management], school : [research, HKU, uncertain_data], work : [AI, finance, prediction] }
  • 6. Back to the future ● 1 database to rule them all ● A scrappy band of rebels ● A brave new idea
  • 7. The big question Why grab this? id thing attr1 attr2 attr3 attr4 attr5 attr6 attr7 attr8 123 doodad abc def ghi jkl mno pqr stu vwx When all you want is this? id thing 123 doodad
  • 9. Relations in pieces id pet weight poops_per_day 1 dog 40 3 2 cat 15 2 3 bird 5 4 4 snake 78 0.25
  • 10. Horizontal Partitions id pet weight poops_per_day 1 dog 40 3 2 cat 15 2 3 bird 5 4 4 snake 78 0.25
  • 11. You gotta get yourself some marble columns.
  • 12. Vertical Partitions id pet weight poops_per_day 1 dog 40 3 2 cat 15 2 3 bird 5 4 4 snake 78 0.25
  • 13. We're gonna need a bigger table.
  • 15. I know that song!
  • 16. Column...families?! Pets Cars row_id best_pet worst_pet illegal_pet row_id make model 123 bulldog turtle rhino 123 Smart Fortwo
  • 17. Modest Map Year of the snake => Year of Python 4G => LTE NoSQL => Non-relational Beard => Face-mane Column-stores => {column-store | column-family-store}
  • 18. Does it smell as sweet?
  • 19. C-Store rocks* ...at column-oriented tasks. * Contrary to popular belief, after years of effort, Cleveland still does not rock.
  • 20. Move, b*tch. Get out the vote. age 23 32 45 67 56 49 43 50 63 34
  • 22. Attack of the clones
  • 25. Tastes like SQL SELECT sensorid, (20-down)/(up-down) AS probability FROM hive_sensors WHERE down>=10 AND up>=20 and down <=20 UNION ALL SELECT sensorid, (up-10)/(up-down) AS probability FROM hive_sensors WHERE up>=10 AND up<=20 and down <=10 UNION ALL SELECT sensorid, 1 AS probability FROM hive_sensors WHERE up<=20 and down >=10 UNION ALL SELECT sensorid, (20-10)/(up-down) AS probability FROM hive_sensors WHERE down<=10 AND up>=20;
  • 26. Cassandra CQL interface Peer to peer Better, but...
  • 27. Anything you can do, I can do better.
  • 28. Sparseness id attr1 attr2 attr3 attr4 1 1 2 1 3 1 4 1 5 6 1 7 8 1 9 1 10 11
  • 29. Dynamic Schemas Pets Cars row_id best_pet worst_pet illegal_pet robot_pet row_id make model 123 bulldog turtle rhino aibo 123 Smart Fortwo 456 shi tzu gecko koala 456 VW Golf
  • 30. Stronger in the broken places
  • 31. Innovation Truly distributed systems Columns as metadata Arbitrarily deep column hierarchies* Community database development * Someday soon, I hope
  • 32. Pig & friends data = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase. HBaseStorage( 'cf1:*', '-loadKey true' ) AS (id:chararray, stats: map[int]); @outputSchema ("values:bag{t:tuple(key, value)}") def bag_of_tuples (map_dict): return map_dict.items() register 'udfs.py' using jython as py data = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase. HBaseStorage( 'cf1:*', '-loadKey true' ) AS (id:chararray, stats: map[int]); databag = foreach data generate id, FLATTEN(py.bag_of_tuples(stats)); from Chase Seibert
  • 33. No dog in this fight
  • 34. Hey I just met you And this is crazy But here's my email Mail me maybe Work Play jeff@aidyia.com jeffreyksmithjr@gmail.com
  • 35. All images used in this presentation were stolen from the internet in a daring midnight raid that left 3 dead and 8 wounded. No license was obtained for their use and no license is implied by their misappropriation. Yarrr. BarrrCamp. Please don't sue me. I have nothing. Just a dog. Don't take my dog. Disclaimer