DB Conan 1.0


Published on

The software is available on http://www.box.net/shared/5ag8ja4eli

Published in: Technology
1 Comment
  • The software is available on: http://www.box.net/shared/5ag8ja4eli
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

DB Conan 1.0

  1. 1. A practical & intelligent enough data retrieval tool Raka Angga Jananuraga [email_address] http://jananuraga.blogspot.com
  2. 2. How it can help you. <ul><li>Get specific records you want from a table, and all related tables, quickly, in a repeatable manner. </li></ul><ul><ul><li>You wouldn’t have to crawl the database manually. </li></ul></ul><ul><ul><li>Not even a single SQL-query to be typed. </li></ul></ul><ul><ul><li>You will get all the results as a collection of CSV files. </li></ul></ul><ul><ul><ul><li>Files stored in an automatically-created folder, ready for packing and sending. </li></ul></ul></ul><ul><ul><li>“ Data mozaic-ing ” improves collaboration and coordination. </li></ul></ul>
  3. 3. When it can help you. <ul><li>Software testing (esp. data verification) </li></ul><ul><ul><li>You want to make sure the data have the correct (expected) state after the execution of an operation. </li></ul></ul><ul><li>System-comprehension / reverse-engineering: </li></ul><ul><ul><li>You want to know / understand how an operation affects the data. </li></ul></ul>
  4. 4. Screenshots <ul><li>Prototype is a pure console application, written in Python, compiled to EXE using py2exe. </li></ul><ul><li>Consists of five small applications that make up the workflow. </li></ul>
  5. 5. Screenshots (2 of 6)
  6. 6. Screenshots (3 of 6)
  7. 7. Screenshots (4 of 6)
  8. 8. Screenshots (5 of 6)
  9. 9. Screenshots (6 of 6)
  10. 10. Brief description of each app. <ul><li>Schema builder : used to define a subset of the entire database schema that you want to focus your (data)analysis on. </li></ul><ul><li>Exploration planner : used to define the starting point of your queries (over the schema), and filters for each table. </li></ul><ul><li>Exploration runner : used to run the exploration planner. It will traverse the schema, returns matching results from each table, and save them as CSV files. </li></ul><ul><li>Plan stitcher : used to stitch exploration plans together to form a mozaic (more on this later). </li></ul><ul><li>Mozaic renderer : used to render the mozaic (more on this later) </li></ul>
  11. 11. Workflow (scenario 1) <ul><li>Fire up the schema builder . It will load and analyze the structure of the database. </li></ul><ul><li>In the schema builder , specify name of a table that you want to be in your schema. </li></ul><ul><li>Starting from that table, add more tables to your list by following the associations. </li></ul><ul><li>Once you have all the tables you’re interested in in your list, save the schema as an SCM file. </li></ul>Build schema Plan exploration Run exploration scm xpp csv csv csv
  12. 12. Workflow (scenario 1) – cont’d. <ul><li>Run the exploration planner (feed in to it the SCM file you just created). </li></ul><ul><li>In the exploration planner , specify the name of “starting table”; that is the table where the crawling will be started from. </li></ul><ul><li>Additionaly, you can also specify filters in each table (except the starting table); to limit the results that you’d like to receive. </li></ul><ul><li>Save the plan as an XPP file. </li></ul>Build schema Plan exploration Run exploration scm xpp csv csv csv
  13. 13. Workflow (scenario 1) – cont’d. <ul><li>Execute the exploration runner (specify as parameter the XPP file you just created). </li></ul><ul><li>The exploration runner will ask you to enter the identifier (s) of the record(s) you’d like to match in the starting table. </li></ul><ul><li>Hit the ‘R’ button to start-off the crawling. </li></ul><ul><li>The exploration runner will query all the tables in the schema by following the associations. </li></ul><ul><li>The resulf of the query will be stored as XPP files (one for each table). </li></ul>Build schema Plan exploration Run exploration scm xpp csv csv csv
  14. 14. Workflow (scenario 1) - demo
  15. 15. Workflow (scenario 1) – cmds. <ul><li>Schema builder </li></ul><ul><ul><li>dbconan_schemabuilder.exe –H –p 1521 –n XE –u MYOHMY –a MYOHMY –o MYOHMY </li></ul></ul><ul><ul><ul><li>-H : the host where the Oracle DB is running. </li></ul></ul></ul><ul><ul><ul><li>-p : the port where the Oracle DB is accepting connection on. </li></ul></ul></ul><ul><ul><ul><li>-n : the name of the DB </li></ul></ul></ul><ul><ul><ul><li>-u : the user for accessing the Oracle DB (must have read access to the DB catalog) </li></ul></ul></ul><ul><ul><ul><li>-a : the password </li></ul></ul></ul><ul><ul><ul><li>-o : the ownerd of the DB </li></ul></ul></ul><ul><ul><li>Note : you can pass –s option, followed by path to an existing SCM file, to edit an existing schema. </li></ul></ul><ul><li>Exploration planner </li></ul><ul><ul><li>dbconan_explorationplanner.exe –s e:myschema_1.scm </li></ul></ul><ul><ul><ul><li>-s : the path to the SCM file that contains the schema to be queried over. </li></ul></ul></ul><ul><ul><li>Note : you can pass –x option (instead of –s), followed by path to an existing XPP file, to edit an existing exploration plan. </li></ul></ul><ul><li>Exploration runner </li></ul><ul><ul><li>dbconan_explorationrunner.exe –x e:myexplorationplan_1.xpp –a MYOHMY </li></ul></ul><ul><ul><ul><li>-x : the path to the XPP file to be “ran”. </li></ul></ul></ul><ul><ul><ul><li>-a : the password for accessing the Oracle DB </li></ul></ul></ul><ul><ul><li>Note : you can pass –H, -p, -n as well. Their values will override the values saved in the XPP. </li></ul></ul>
  16. 16. Workflow (scenario 2) Build schema Plan exploration Stitch plan scm (1) xpp (1) mzc Plan exploration xpp (2) Build schema Plan exploration xpp (3) scm (2) Render mozaic csv csv csv csv csv csv
  17. 17. Workflow (scenario 2) – why? <ul><li>Three valid uses: </li></ul><ul><ul><li>As a workaround for some constraints. </li></ul></ul><ul><ul><ul><li>Hint : no cycles are allowed in the schema. </li></ul></ul></ul><ul><ul><li>To load data from multiple databases at once. </li></ul></ul><ul><ul><ul><li>Usually when there are correlations between the content of those separate databases. </li></ul></ul></ul><ul><ul><li>Collaboration & coordination in a team. </li></ul></ul><ul><ul><ul><li>Hint : split-and-(later)-merge way of working together. </li></ul></ul></ul>
  18. 18. Mozaicing overcomes constraint <ul><li>The constraint : your schema must not contain any cycles (schemabuilder will ensure your schema wouldn’t break the constraint). </li></ul><ul><li>Consequence : you wouldn’t be able to load data of certain structures using single schema. </li></ul><ul><ul><li>E.g. : the typical “employee-manager” relationship; because it involves a cycle (in this case a loop). </li></ul></ul><ul><ul><ul><li>Don’t worry: we’re talking about schema of this DBConan application; not your database. You can design your database structure the way you want. </li></ul></ul></ul><ul><li>Workaround : for this particular “employee-manager” case, just as an example, create one schema, and create two * exploration plans out of that schema, and then stitch them together as a mozaic. </li></ul><ul><ul><li>The following diagram will make that point clear(er). </li></ul></ul>* Actually for this case you can make it with only one plan. It depends.
  19. 19. Mozaicing overcome constraints emp emp trigger The (data) mozaic Mozaic element 1 Plan 1 Schema 1 Mozaic element 2 Plan 2 Schema 1
  20. 20. Mozaicing overcome constraints (what does the trigger say?) <ul><li>It’s something along this line: </li></ul><ul><ul><li>From each record found in the employee table in mozaic-element * 1, take the value of column manager_id . </li></ul></ul><ul><ul><li>Use those values to query the employee table in mozaic-element 2 on emp_id column. </li></ul></ul><ul><li>In generalg you can spell it this way: </li></ul><ul><ul><li>From each record found in the table A in mozaic-element X, take the value of the columns that make up key M. </li></ul></ul><ul><ul><li>Use those values to query the starting table of mozaic-element Y, that we’ll refer to as table B. </li></ul></ul><ul><ul><ul><li>The key M must be compatible with the primary-key of table B, in the sense that they are made up from the same number of columns. </li></ul></ul></ul><ul><ul><ul><li>Alternatively you can link by column (you will have to name the column in table A and table B that will be linked). </li></ul></ul></ul>*) Mozaic-element is just a wrapper around exploration-plan, to contain additonal attributes (i.e.: links to another mozaic-elements)
  21. 21. Another example of mozaicing (workaround the no-cycle constraint) <ul><li>Suppose we have the following database: </li></ul><ul><li>And our data retrieval task is: given a class, find the professor who delivers it, and all the students enrolled in the class. Additionaly find the mentor of each one of those students. </li></ul>Student ClassRegistration Class Professor 1 * * 1 1 * * 1 delivered by mentored by
  22. 22. Another example of mozaicing (workaround the no-cycle constraint) – cont’d <ul><li>You’ll have to create two separate schemas, and create a plan for each one of those schema, and finally stitch the two plans (by connecting the STUDENT table in plan A and PROFESSOR table in plan B, using the mentor_fk defined in STUDENT table). </li></ul>1 * * 1 * 1 <ul><li>Suppose the mozaic-rendering starts from mozaic-element A. Once the exploration on mozaic-element A is completed, exploration on mozaic-element B will be started, using the values extracted from the STUDENT table. </li></ul>Student ClassRegistration Class Professor Professor
  23. 23. Why introducing that constraint at the first place? <ul><li>Firstly, from my stand point as programmer: dealing with cycle requires more programming, and I didn’t have time. </li></ul><ul><li>Secondly – this is more important I believe – I don’t want to confuse users by having the results of queries to a table, from various traversal paths, cramped into a single CSV file…, or having multiple CSV files for each query path, and twiddling with funky filenaming (e.g.: PROFESSOR_FROM_STUDENT_ROUTE_1.csv – I’ve tried that, and I said to myself wtf?). </li></ul><ul><ul><li>Each mozaic-element has its own folder, under a common folder for the mozaic they belong to, for storing their CSV files. </li></ul></ul><ul><li>So, I managed to convince myself cutting the cycle is a good idea after all. </li></ul>
  24. 24. Mozaicing to load from multiple databases. <ul><li>This is possible because each plan that you stitch together has its own database information (DB name, DB user, DB owner). </li></ul><ul><ul><li>The only limitation in the current version is: all those databases must have the same password (as password information is not stored anywhere). </li></ul></ul><ul><li>I actually use this capability in the current project, because there is separate database from each vendor of the products that we’re integrating. </li></ul>
  25. 25. Mozaicing helps collaboration and coordination. <ul><li>In a large system there are several people, each specializing / focusing on specific area of the system, looking only at a handful of tables (out of the 500++ tables in the system). </li></ul><ul><li>Of course, those parts don’t work in isolation. There are times we need to see how a change in one part (as a result of an operation) affects the other parts. </li></ul><ul><li>They’d put together the schema from each area / person, and define a data-mozaic out of them, and render it. </li></ul>
  26. 26. Data mozaicing - demo
  27. 27. Data mozaicing – cmds. <ul><li>Plan stitcher </li></ul><ul><ul><li>dbconan_explorationplanner.exe e:plan_1.xpp e:plan_2.xpp f:plan_3.xpp </li></ul></ul><ul><ul><ul><li>You can pass a variable a number of plans. </li></ul></ul></ul><ul><li>Mozaic renderer </li></ul><ul><ul><li>dbconan_mozaicrenderer.exe –a MYOHMY –m e:mymozaic_1.mzc </li></ul></ul><ul><ul><ul><li>-m : the path to the MZC file to be “rendered”. </li></ul></ul></ul><ul><ul><ul><li>-a : the password for accessing the Oracle DB </li></ul></ul></ul><ul><ul><li>Note : you can pass –H, -p, -n as well. Their values will override the values saved in the MZC (but of course it’s not recommended in case the plans in the mozaic are not from a single database). </li></ul></ul>
  28. 28. Can I trust the result? <ul><li>Yes you can  . Allright, here’s an example graph along with the walk. Suppose the exploration starts from B. Spot the zigzag manner in which the graph is traversed. </li></ul>The walk : B C A F H D G E I K J A B C F H G D I J K E
  29. 29. Explanation of the walk. <ul><li>It zigzags. First, mark “going forward” as the current direction (e.g.: from B’s standpoint, the link to C is a forward link). </li></ul><ul><li>Walk along the current direction. You’re zigging. </li></ul><ul><li>Along the way, keep track of the nodes from which there are link(s) going in the opposite direction (backward). Also mark the current node as “visited”. Do SQL query on the spot. </li></ul><ul><li>Once you hit the dead-end, mark “going backward” as the current direction. </li></ul><ul><li>From each and every node noted in step 3, go in current direction. You’re zagging. Do the same as in step 3, but now keep track of the nodes from which there are link(s) going forward. </li></ul><ul><li>Repeat the zigzagging until all the nodes in the graph are marked as visited. </li></ul>
  30. 30. Explanation of the walk. <ul><li>I found the sequence of nodes doesn’t really matter. It doesn’t matter if you go zagging first, or zigging first; the end result will be the same. </li></ul><ul><li>However, the way a node (table) is queried depends on whether it’s visited by a walk going “forward” or “backward”: </li></ul><ul><ul><li>If the node is visited by a walk going “forward”, the node will be queried on its primary-key, using the values of the foreign-key in the previous node in the walk, that corresponds to the link that connects both nodes. </li></ul></ul><ul><ul><li>If the node is visited by a walk going “backward”, the node will be queried on its foreign-key, using the valus of the primary-key of the previous node in the walk. </li></ul></ul>
  31. 31. Downloading and running <ul><li>Download the application from http://www.box.net/shared/5ag8ja4eli </li></ul><ul><li>Just unzip it, you’ll find 5 EXE files in there. </li></ul><ul><li>Currently this application works with Oracle database only. So, here’s the possibilities: </li></ul><ul><ul><li>If you already have Oracle Express Edition installed, you don’t have to do anything. Just execute the EXE from command line. </li></ul></ul><ul><ul><li>If you already have Oracle client 10.1 (or 10.2) installed, you don’t have to do anything else. </li></ul></ul><ul><ul><li>Otherwise: </li></ul></ul><ul><ul><ul><li>Download Oracle Instant Client 10.1 from Oracle’s website (free, but you need to login, registration is free). </li></ul></ul></ul><ul><ul><ul><li>Unzip the instant client on your computer (e.g.: c:instantclient10.1) </li></ul></ul></ul><ul><ul><ul><li>Execute the following lines on the command line before executing any EXE of DBConan for the first time: </li></ul></ul></ul><ul><ul><ul><ul><li>SET ORACLE_HOME=C:instantclient10.1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>SET PATH=%ORACLE_HOME%;%PATH% </li></ul></ul></ul></ul>
  32. 32. What’s next? <ul><li>Obviously a rewrite, with a GUI, better user experience, improved algorithm for graph analysis, more analytic tools, better output (preferrably an MS-Excel file, instead of a collection of CSV files. More handy for data analysis that works a alot with excel), etc. </li></ul><ul><li>More detail – such as manual / test cases – of this version will be made available on my site: http://jananuraga.blogspot.com </li></ul><ul><li>Feedbacks / participations in further development are gladly welcome. Please drop me an email: raka.angga@gmail.com </li></ul>
  33. 33. Thank you, gracias, terima kasih, xie xie, matur suksma!