1. CloudGraph ®
MySql to HBase in 5 Steps
Converting MySql or Oracle databases to Apache HBase™ with on-line
examples using the popular Wordnet® dictionary
Scott Cinnamond – TerraMeta Software Inc.
http://cloudgraph.org
2. What is Wordnet ?
®
• Large complex lexical (MySql) database of
English.
• Nouns, verbs, adjectives and adverbs
grouped into sets of cognitive synonyms
(synsets), each expressing a distinct
concept.
• Synsets are interlinked by means of
conceptual-semantic and lexical relations.
3. HBase Conversion Steps
http://wordnet.cloudgraph.org
1) Model Creation: reverse engineer Wordnet DB
into UML®
2) Code Generation: provision persistence and
query-DSL java code
3) HBase™ Table Mapping: map data graphs and
row keys to table(s)
4) Data Migration: MySql to HBase
5) Services / App Creation: build services,
web app
4. 1.) Model Creation
Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model
• Capture entities, properties, data types,
associations, enumerations, comments as UML
• Why UML? Popular standards-based format.
Editable, viewable using standard tools.
Supports enterprise governance processes
• How? Maven build with plasma-maven-plugin
RDB tool (goal:RDB, action:reverse, dialect:mysql)
• Download working example at
https://github.com/cloudgraph/wordnet
6. 2.) Code Generation
Provision SDO persistence and query DSL java code
• Generate Java API based on Wordnet UML
Model
• Why? Use across RDB, HBase, other
CloudGraph Services. Compile time checking for
queries, all persistence logic
• How? Maven build with plasma-maven-plugin
SDO and DSL tools
• See generated API Javadocs on-line at
http://wordnet.cloudgraph.org
7. 3.) HBase™ Table Mapping
Map data graphs and row keys to HBase™ table(s)
• Configure delimited, hashed, salted, formatted,
composite row keys with (xpath) paths into
target data graphs
• Map data graph roots to HBase tables
• Why? Automates row-key creation via data
extraction processing from anywhere in your
data graphs
• How? CloudGraph Configuration XML. See
https://github.com/cloudgraph/wordnet
8. 4.) Data Migration
MySql to HBase
• Create RDB-to-HBase standalone
migration app using generated
persistence and DSL query API
incrementally call CloudGraph HBase and
RDB services
• Why? Wordnet data is large and highly
connected, so must be incrementally
extracted/inserted and linked
9. 5.) Services / App Creation
Build services, web app
• Build simple pojo services using
persistence and DSL query API
• Encapsulate Wordnet business logic
• Add adapter/wrapper structures
• Call services called from web-app
10. Web App
http://wordnet.cloudgraph.org
• Auto-complete field triggers CloudGraph
HBase to use the HBase fuzzy row filter
API
• Find button returns all semantic and
lexical relations for the selected word,
including descriptions and example
sentences
• Resulting relation graphs typically contain
more than 100 nodes and return in less
than 200 milliseconds
11. Conclusions
• Complex, highly recursive RDB models
can be easily converted and leveraged in
HBase and future CloudGraph services
• Large lexical data graphs can be returned
in single query
• Data migration difficult given complex
recursive model
12. Resources
• Download the complete CloudGraph Wordnet
example: https://github.com/cloudgraph/wordnet
• Run the example online:
http://wordnet.cloudgraph.org
• Project details, contact information:
http://cloudgraph.org
• Beta Source Repo:
https://github.com/terrameta/cloudgraph
• Production Source Repo (under construction):
https://github.com/cloudgraph
13. Status / Legal
•
•
•
Project Status
– CloudGraph ® is currently under private beta testing
Licensing
– CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed
under version 2 of the GNU General Public License
Trademarks
– WordNet ® is a registered trademark of Princeton University
– Apache HBase™ is a trademark of Apache Software Foundation
– CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta
Software Inc.