Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hands-on demo of PDI using webSpoon

2,293 views

Published on

webSpoon is a web-based Spoon, an graphical designer for Pentaho Data Integration.
This is a slide deck used in a Pentaho Bay Area Meetup on 4/27/17 (http://meetu.ps/e/CFNKy/cb4hM/f)

Published in: Software
  • Be the first to comment

Hands-on demo of PDI using webSpoon

  1. 1. © Hitachi America, Ltd. 2017. All rights reserved. Hands-on demo of PDI using webSpoon Researcher at Hitachi America, Ltd. 4/27/2017 Hiromu Hota, PhD @HiromuHota, hiromu.hota@hal.hitachi.com
  2. 2. © Hitachi America, Ltd. 2017. All rights reserved. Get started with webSpoon 1
  3. 3. © Hitachi America, Ltd. 2017. All rights reserved. How to get started with webSpoon 2 1. Visit https://HighlyAvailable-env.i8gkiqhycy.us-west-2.elasticbeanstalk.com (will be deleted after the meetup) 2. Login with User: user Password: password 3. From the top menu, click File > New > Transformation
  4. 4. © Hitachi America, Ltd. 2017. All rights reserved. • Transformations – are data flows, which typically start from data sources, go through some processing, and end at a target database table. – are comprised of steps and hops. – are saved as .ktr (Kettle) files or to a repository. • Steps and Hops – Steps are designed for a specific task such as input, output, scripting, etc. – Hops are directed data pathways that connect steps. Basic Concepts of PDI 3 HopStep Trans.ktr Repository Save
  5. 5. © Hitachi America, Ltd. 2017. All rights reserved. How to operate webSpoon 4 • Drawing Steps 1. Under the Design tab, expand the Input node, then click and drag a Generate random credit card numbers step onto the canvas. 2. Expand the Flow node; click and drag a Dummy (do nothing) step onto the canvas. • Drawing Hops (similar to the way in Spoon) 1. Key-down and hold the <SHIFT> key. 2. Click-down and hold the Generate random credit card numbers step. 3. Move the mouse cursor to the Dummy (do nothing) step. 4. Release the click and the key.
  6. 6. © Hitachi America, Ltd. 2017. All rights reserved. Example demo 5
  7. 7. © Hitachi America, Ltd. 2017. All rights reserved. Demo story 6 • Background – Ichiro Hitachi works for a travel agency, based in San Francisco. – He wants to offer additional benefit to his customer tourists. – He personally likes to visit filming locations when visiting a new place, so strongly believes that such information is useful for them too. • Movie location notifier – When his customers come close to a filming location, they receive a notification that tells title, year, short plot, actor, and address (Cropped) Map of San Francisco by Ryan Holliday / CC-BY-SA 4.0 • Godzilla (2014) • He attacked GGB • Golden Gate Bridge • Forrest Gump (1994) • He has accidentally been present at many historic moments • 3301 Lyon Street
  8. 8. © Hitachi America, Ltd. 2017. All rights reserved. Source data: “Film Locations in San Francisco” 7 • Source data – Available on SF OpenData (https://data.sfgov.org/). – A list of filming locations of movies shot in San Francisco. • Web APIs to retrieve missing information – OMDb (Open Movie Database) API • Short plot of the movie – Google Maps API • Formatted (normalized) address (e.g., Palace of Fine Arts -> 3301 Lyon Street) • Latitude & Longitude of the location, to calculate the distance from each user Title Year Locations Actor1 ... Godzilla 2014 Kearney & Pine St. Forrest Gump 1994 Palace of Fine Arts ...
  9. 9. © Hitachi America, Ltd. 2017. All rights reserved. High-level demo system architecture 8 webSpoon SF OpenData Organizer Participants Database Google Maps API OMDb API Raw data Operations Enriched data Specific location data Geo data Movie data Not covered today
  10. 10. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 1) 9 1. Open an example file and save in a different name 1. Click File > Open, select example2, then click OK 2. Click File > Save as, change Transformation name to be unique (not to be overwritten by others), then click OK
  11. 11. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 2) 10 2. Run 1. Click the Run button or Action > Run from the menu 2. Click the Run button at the bottom right Step 3.1 Step 3.2
  12. 12. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 3) 11 3. Preview the result 1. Click on the “Dummy (do nothing)” step 2. Click on the “Preview data” tab in the “Execution Results” at the bottom 3. See other steps
  13. 13. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 4) 12 4. Complete the data flow by enabling the disabled hop 1. Click on the hop between “Dummy (do nothing)” and “Filter out rows...” 2. Save, Run, and preview the result
  14. 14. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 5) 13 5. Explorer the rest yourself; for example, – Click on each step and see how it is configured – Explorer what kinds of steps are available – Design the exact same flow yourself – Download and deploy webSpoon • Docker image: https://hub.docker.com/r/hiromuhota/webspoon/ • WAR file: https://github.com/HiromuHota/pentaho-kettle/releases – Download and install Pentaho Data Integration (including Spoon) • http://www.pentaho.com/download (Enterprise Edition) • http://community.pentaho.com/ (Community Edition)
  15. 15. © Hitachi America, Ltd. 2017. All rights reserved. Trademarks and copyrights 14 • Pentaho is a registered trademark of Pentaho, Inc. • AWS, Amazon Elastic Beanstalk, and any other AWS Marks and Services are trademarks of Amazon Web Services, Inc. • The use of AWS Simple Icons is permitted by Amazon Web Services, Inc. • Godzilla is a registered trademark of Toho Co., Ltd. • Google Maps is a trademark of Google Inc. • All content via OMDb API is licensed by Brian Fritz under CC BY-NC 4.0.
  16. 16. © Hitachi America, Ltd. 2017. All rights reserved. Demo system architecture 16 webSpoon Classic Load Balancer Auto Scaling group Elastic Beanstalk AWS cloud SF OpenData ・・・ Organizer ParticipantsDatabase Geo data, Movie data

×