CloudGraph ®

MySql to HBase in 5 Steps
Converting MySql or Oracle databases to Apache HBase™ with on-line
examples using the popular Wordnet® dictionary
Scott Cinnamond – TerraMeta Software Inc.
http://cloudgraph.org
What is Wordnet ?
®

• Large complex lexical (MySql) database of
English.
• Nouns, verbs, adjectives and adverbs
grouped into sets of cognitive synonyms
(synsets), each expressing a distinct
concept.
• Synsets are interlinked by means of
conceptual-semantic and lexical relations.
HBase Conversion Steps
http://wordnet.cloudgraph.org

1) Model Creation: reverse engineer Wordnet DB
into UML®

2) Code Generation: provision persistence and
query-DSL java code

3) HBase™ Table Mapping: map data graphs and
row keys to table(s)

4) Data Migration: MySql to HBase
5) Services / App Creation: build services,
web app
1.) Model Creation
Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model

• Capture entities, properties, data types,
associations, enumerations, comments as UML
• Why UML? Popular standards-based format.
Editable, viewable using standard tools.
Supports enterprise governance processes
• How? Maven build with plasma-maven-plugin
RDB tool (goal:RDB, action:reverse, dialect:mysql)
• Download working example at
https://github.com/cloudgraph/wordnet
Generated Wordnet Model
(core subset of 30 total entities and enumerations)
2.) Code Generation
Provision SDO persistence and query DSL java code

• Generate Java API based on Wordnet UML
Model
• Why? Use across RDB, HBase, other
CloudGraph Services. Compile time checking for
queries, all persistence logic
• How? Maven build with plasma-maven-plugin
SDO and DSL tools
• See generated API Javadocs on-line at
http://wordnet.cloudgraph.org
3.) HBase™ Table Mapping
Map data graphs and row keys to HBase™ table(s)

• Configure delimited, hashed, salted, formatted,
composite row keys with (xpath) paths into
target data graphs
• Map data graph roots to HBase tables
• Why? Automates row-key creation via data
extraction processing from anywhere in your
data graphs
• How? CloudGraph Configuration XML. See
https://github.com/cloudgraph/wordnet
4.) Data Migration
MySql to HBase

• Create RDB-to-HBase standalone
migration app using generated
persistence and DSL query API
incrementally call CloudGraph HBase and
RDB services
• Why? Wordnet data is large and highly
connected, so must be incrementally
extracted/inserted and linked
5.) Services / App Creation
Build services, web app

• Build simple pojo services using
persistence and DSL query API
• Encapsulate Wordnet business logic
• Add adapter/wrapper structures
• Call services called from web-app
Web App
http://wordnet.cloudgraph.org

• Auto-complete field triggers CloudGraph
HBase to use the HBase fuzzy row filter
API
• Find button returns all semantic and
lexical relations for the selected word,
including descriptions and example
sentences
• Resulting relation graphs typically contain
more than 100 nodes and return in less
than 200 milliseconds
Conclusions
• Complex, highly recursive RDB models
can be easily converted and leveraged in
HBase and future CloudGraph services
• Large lexical data graphs can be returned
in single query
• Data migration difficult given complex
recursive model
Resources
• Download the complete CloudGraph Wordnet
example: https://github.com/cloudgraph/wordnet
• Run the example online:
http://wordnet.cloudgraph.org
• Project details, contact information:
http://cloudgraph.org
• Beta Source Repo:
https://github.com/terrameta/cloudgraph
• Production Source Repo (under construction):
https://github.com/cloudgraph
Status / Legal
•
•

•

Project Status
– CloudGraph ® is currently under private beta testing
Licensing
– CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed
under version 2 of the GNU General Public License
Trademarks
– WordNet ® is a registered trademark of Princeton University
– Apache HBase™ is a trademark of Apache Software Foundation
– CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta
Software Inc.

MySql to HBase in 5 Steps

  • 1.
    CloudGraph ® MySql toHBase in 5 Steps Converting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary Scott Cinnamond – TerraMeta Software Inc. http://cloudgraph.org
  • 2.
    What is Wordnet? ® • Large complex lexical (MySql) database of English. • Nouns, verbs, adjectives and adverbs grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations.
  • 3.
    HBase Conversion Steps http://wordnet.cloudgraph.org 1)Model Creation: reverse engineer Wordnet DB into UML® 2) Code Generation: provision persistence and query-DSL java code 3) HBase™ Table Mapping: map data graphs and row keys to table(s) 4) Data Migration: MySql to HBase 5) Services / App Creation: build services, web app
  • 4.
    1.) Model Creation Reverseengineer Wordnet DB into PlasmaSDO™ UML® Model • Capture entities, properties, data types, associations, enumerations, comments as UML • Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes • How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql) • Download working example at https://github.com/cloudgraph/wordnet
  • 5.
    Generated Wordnet Model (coresubset of 30 total entities and enumerations)
  • 6.
    2.) Code Generation ProvisionSDO persistence and query DSL java code • Generate Java API based on Wordnet UML Model • Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic • How? Maven build with plasma-maven-plugin SDO and DSL tools • See generated API Javadocs on-line at http://wordnet.cloudgraph.org
  • 7.
    3.) HBase™ TableMapping Map data graphs and row keys to HBase™ table(s) • Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs • Map data graph roots to HBase tables • Why? Automates row-key creation via data extraction processing from anywhere in your data graphs • How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet
  • 8.
    4.) Data Migration MySqlto HBase • Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services • Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked
  • 9.
    5.) Services /App Creation Build services, web app • Build simple pojo services using persistence and DSL query API • Encapsulate Wordnet business logic • Add adapter/wrapper structures • Call services called from web-app
  • 10.
    Web App http://wordnet.cloudgraph.org • Auto-completefield triggers CloudGraph HBase to use the HBase fuzzy row filter API • Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences • Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds
  • 11.
    Conclusions • Complex, highlyrecursive RDB models can be easily converted and leveraged in HBase and future CloudGraph services • Large lexical data graphs can be returned in single query • Data migration difficult given complex recursive model
  • 12.
    Resources • Download thecomplete CloudGraph Wordnet example: https://github.com/cloudgraph/wordnet • Run the example online: http://wordnet.cloudgraph.org • Project details, contact information: http://cloudgraph.org • Beta Source Repo: https://github.com/terrameta/cloudgraph • Production Source Repo (under construction): https://github.com/cloudgraph
  • 13.
    Status / Legal • • • ProjectStatus – CloudGraph ® is currently under private beta testing Licensing – CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed under version 2 of the GNU General Public License Trademarks – WordNet ® is a registered trademark of Princeton University – Apache HBase™ is a trademark of Apache Software Foundation – CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta Software Inc.