english

304 views
233 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
304
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

english

  1. 1. Thesis presentation Yakham NDIAYE   November, 13 the 2001 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS
  2. 2. <ul><li>Develop techniques for the interoperability of a DBMS with an external SDDS file. </li></ul><ul><li>Examine various architectural issues, making such a coupling the most efficient. </li></ul><ul><li>Validate our technical choices by the prototyping and the experimental performances analysis. </li></ul><ul><li>Our approach is at the crossing the main memory DBMS, the object-relational-DBMS with the foreign functions, and the distributed/parallel DBMS. </li></ul>Objective
  3. 3. <ul><li>Multicomputers </li></ul><ul><li>SDDSs </li></ul><ul><li>AMOS-II & DB2 DBMSs </li></ul><ul><li>Coupling SDDS and AMOS-II </li></ul><ul><li>Coupling SDDS and DB2 </li></ul><ul><li>Experimental analysis  </li></ul><ul><li>Conclusion </li></ul>Plan
  4. 4. Multicomputers <ul><li>A collection of loosely coupled computers </li></ul><ul><ul><li>Computers inter-connected by high-speed local area networks. </li></ul></ul><ul><li>Cost/Performance </li></ul><ul><ul><li>offers potentially storage and processing capabilities rivaling a supercomputer at a fraction of the cost. </li></ul></ul><ul><li>New architectural concepts </li></ul><ul><ul><li>offer to applications the cumulated CPU and storage capabilities of a large number of inter-connected computers. </li></ul></ul>
  5. 5. <ul><li>New data structures specifically for Multicomputers </li></ul><ul><li>Data are structured </li></ul><ul><ul><li>- records with keys </li></ul></ul><ul><ul><ul><li>parallel scans & function shipping </li></ul></ul></ul><ul><li>Data are on servers </li></ul><ul><ul><li>- waiting for access </li></ul></ul><ul><li>Overflowing servers split into new servers </li></ul><ul><ul><li>- appended to the file without informing the clients </li></ul></ul><ul><li>Queries come from multiple autonomous clients </li></ul><ul><ul><li>- Access initiators </li></ul></ul><ul><ul><li>- Not using any centralized directory for access computations </li></ul></ul><ul><li>See for more : http://ceria.dauphine.fr </li></ul>SDDS
  6. 6. <ul><li>AMOS-II : A ctive M ediating O bject S ystem </li></ul><ul><li>A main memory database system . </li></ul><ul><li>Declarative query language : AMOSQL . </li></ul><ul><li>External data sources capability. </li></ul><ul><li>External program interfaces AMOS-II using : </li></ul><ul><ul><li> - Call-level interface (call-in) </li></ul></ul><ul><ul><li> - Foreign functions (call-out) </li></ul></ul><ul><li>See the AMOS-II page for more: </li></ul><ul><li>http://www.dis.uu.se/~udbl/ </li></ul>AMOS-II DBMS
  7. 7. <ul><li>IBM object-relational DBMS </li></ul><ul><li>« DB2 Universal Database ». </li></ul><ul><li>Typical representative of a commercial relational-object DBMS . </li></ul><ul><li>Capabilities to handle external data through the user-defined functions (UDF) . </li></ul>DB2 Universal Database 
  8. 8. Coupling Strategies <ul><li>AMOS-SDDS Strategy : </li></ul><ul><li>- for a scalable RAM file supporting database queries </li></ul><ul><li>- Use a DBMS for manipulations best handled through by the query language  ; </li></ul><ul><li>- Direct fast data access for manipulations not supported well, or at all, by a DBMS ; </li></ul><ul><li>- Distributed queries processing with functions shipping . </li></ul>
  9. 9. AMOS-SDDS System AMOS-SDDS scalable parallel query processing
  10. 10. Coupling Strategies <ul><li>SD-AMOS Strategy : </li></ul><ul><li>- Uses AMOS-II as the memory manager at each SDDS storage site ; </li></ul><ul><li>- Scalable generalization of a parallel DBMS ; </li></ul><ul><li>- D ata partitioning becomes dynamic . </li></ul>
  11. 11. SD-AMOS System SD- AMOS scalable parallel query processing
  12. 12. Couplage SDDS & DB2 <ul><li>DB2-SDDS Strategy : </li></ul><ul><li>- C oupling of a DBMS with an external data repository with direct fast data access . </li></ul><ul><li>- Use of a SDDS file by a DBMS like an external data repository. </li></ul><ul><li>- Offer to the user an interface more elaborate than that of SDDS manager, in particular by his query language . </li></ul>
  13. 13. Coupling SDDS & DB2 DB2-SDDS Overall Architecture Register a user-defined external table function : CREATE FUNCTION scan(Varchar(20)) RETURNS TABLE (ssn integer, name Varchar(20), city Varchar(20)) EXTERNAL NAME ‘ interface !fullscan'
  14. 14. Coupling SDDS & DB2 Foreign functions to access SDDS records from DB2 : range (cleMin, cleMax) -> liste enregistrements dont cleMin < clé < cleMax scan( nom_fichier ) -> liste de tous les enregistrements du fichier Sample queries  : - Parallel scan All SDDS records. select * from table( scan(‘fichier’) ) as table_sdds(SSN, NAME,CITY) - Range query SDDS records where key between 1 and 100. select * from table( range(1, 100) ) as table_sdds(SSN, NAME,CITY) order by Name
  15. 15. <ul><li>Six Pentium III 700 MHz with 256 MB of RAM running Windows 2000 </li></ul><ul><li>On a 100Mbit/s Ethernet network. </li></ul><ul><li>One site is used as Client and the five other as Servers </li></ul><ul><li>We run many servers at the same machine (up to 3 per machine) . </li></ul><ul><li>File scaled from 1 to 15 servers . </li></ul>The Hardware
  16. 16. <ul><li>Benchmark data : </li></ul><ul><ul><li>Table Person (SS#, Name, City) . </li></ul></ul><ul><ul><li>Size 20,000 to 300,000 tuples of 25 bytes . </li></ul></ul><ul><ul><li>50 Cities. </li></ul></ul><ul><ul><li>Random distribution . </li></ul></ul><ul><li>Benchmark query : « couples of persons in the same city »  </li></ul><ul><ul><li>Query 1, the file resides at a single AMOS-II. </li></ul></ul><ul><ul><li>Query 2, the file resides at AMOS-SDDS. </li></ul></ul><ul><ul><li>Join evaluation : Two strategies. </li></ul></ul><ul><li>Measures : </li></ul><ul><ul><li>- Speed-up & Scale-up </li></ul></ul><ul><li>Processing time of aggregate functions </li></ul>Benchmark queries
  17. 17. Server Query Processing <ul><li>E-strategy </li></ul><ul><ul><li>Data stay external to AMOS </li></ul></ul><ul><ul><li>» within the SDDS bucket </li></ul></ul><ul><ul><li>Custom foreign functions perform the query </li></ul></ul><ul><li>I-strategy </li></ul><ul><ul><li>Data are dynamically imported into AMOS-II </li></ul></ul><ul><ul><ul><li>» Possibly with the local index creation </li></ul></ul></ul><ul><ul><ul><li>» Deleted after the processing </li></ul></ul></ul><ul><ul><ul><li>» Good for joins </li></ul></ul></ul><ul><ul><li>AMOS performs the query </li></ul></ul>
  18. 18. Speed-up   Elapsed time of Query 2 according to the strategy for a file of 20,000 records, distributed over 1 to 5 servers. I-Strategy for Query 2: elapsed time E-Strategy for Query 2: elapsed time Elapsed time per tuple of Query 2 according to the strategy Server nodes 1 2 3 4 5 Elapsed time(s) 1,344 681 468 358 288 Time per tuple (ms) 67.2 34 23.4 17.9 14.4 Serveur nodes 1 2 3 4 5 Nested-loop(s) 128 78 64 55 48 Index lookup(s) 60 39 37 36 32
  19. 19. <ul><li>The results showed an important advantage of I-Strategy on E-Strategy for the evaluation of the join query. </li></ul><ul><li>For 5 servers, the rate is 6 times for the nested loop, and 9 times if an index is creates. </li></ul><ul><li>The favorable result makes us study the scale-up characteristics of AMOS-SDDS on a file that scales up to 300,000 tuples. </li></ul>Discussion  
  20. 20. Scaling the number of servers   Elapsed time of join queries to AMOS-SDDS Q1 = AMOS-SDDS join; Q2 = AMOS-SDDS join with count. Time per tuple (extrapolated for AMOS-SDDS) File size 20,000 60,000 100,000 160,000 200,000 240,000 300,000 # SDDS servers 1 3 5 8 10 12 15 Q1 (ms) 3.05 5.02 6.84 11.36 12.77 16.25 18.55 Q2 (ms) 2.55 3.08 3.35 6.16 6.39 8.43 8.75 Q1 w. extrap. (ms) 3.05 5.02 6.84 8.28 9.6 10.64 12.72 Q2 w. extrap. (ms) 2.55 3.08 3.35 3.11 3.2 2.84 2.94 AMOS-II (ms) 2.30 7.17 12.01 19.41 24.12 2 9.08 36.44
  21. 21. Scaling the number of servers   <ul><li>Results are extrapolated to 1 server per machine. </li></ul><ul><li>- Basically, the CPU component of the elapsed time is divided by 3 </li></ul><ul><li>The extrapolation of the processing time of the join query with count shows a linear scalability of the system. </li></ul><ul><li>Processing time per tuple remains constant (2.94ms) when the file size and the number of servers increase by the same factor. </li></ul>Expected time per tuple of join queries to AMOS-SDDS
  22. 22. Aggregate Function count Elapsed time of aggregate function Count Elapsed times for AMOS-II = 280ms Elapsed time of aggregate functions Count under AMOS-SDDS Elapsed time over 100,000-tuple file on AMOS-SDDS # servers 1 2 3 4 5 E-Stratégie (ms) 10 10 10 10 10 I-Stratégie (ms) 1,462 761 511 440 341
  23. 23. Aggregate Function max Elapsed time of aggregate function Max Elapsed times for AMOS-II = 471ms Elapsed time over 100,000-tuple file on AMOS-SDDS Elapsed time of aggregate functions Max under AMOS-SDDS #servers 1 2 3 4 5 I-Stratégie (ms) 420 210 140 110 90 I-Stratégie (ms) 1,663 831 561 491 390
  24. 24. <ul><li>Contrary to the join query, the external strategy is gaining for the evaluation of aggregate functions. </li></ul><ul><li>For count function, improvement is about 34 times . </li></ul><ul><li>For max function, improvement is about 4 times . </li></ul><ul><li>Due to the importation cost and to a SDDS property : the current number of records is a parameter of a bucket. </li></ul><ul><li>Linear Speed-up : processing time decreases with the number of servers. </li></ul><ul><li>The use of the external functions can thus be very advantageous for certain kind of operations. </li></ul>Discussion  
  25. 25. SD-AMOS performance measurements Creation time of 3,000,000 records file. The bucket size is 750,000 records of 100 bytes Global and moving average insertion time of a record
  26. 26. SD-AMOS performance measurements Elapsed time of range query Average time per tuple
  27. 27. <ul><li>The average insertion time of a record with the splits is of 0.15ms . </li></ul><ul><li>The average access time to a record on a distributed file is of 0.12ms . </li></ul><ul><li>- It is 100 times faster than that with a traditional file on disc . </li></ul><ul><li>Linear scalability : The insertion time and the access time per tuple remains constant when the file size and the number of servers increase . </li></ul>Discussion  
  28. 28. DB2-SDDS performance measurements Elapsed time of range query Time per tuple (i) access time to the data in a DB2 table, (ii) access time to SDDS file from the DB2 external functions (DB2-SDDS) and (iii) direct access time to SDDS file from a SDDS client.
  29. 29. <ul><li>Access time to SDDS file is much faster than the access time to a DB2 table: 0.02ms versus 0.07ms. </li></ul><ul><li>Access time to external data from DB2 (0.08ms), is less fast than the access to the internal data (0.07ms) . </li></ul><ul><li>Coupling cost </li></ul><ul><li>An application has : </li></ul><ul><ul><li>- fast direct access to the data </li></ul></ul><ul><ul><li>- through the DBMS, access by the query language </li></ul></ul>Discussion  
  30. 30. <ul><li>We have coupled a SDDS manager with a main-memory DBMS AMOS-II and DB2 to improve the current technologies for high-performance databases and for the coupling with external data repositories. </li></ul><ul><li>The experiments we have reported in the Thesis prove the efficiency of the system. </li></ul><ul><li>AMOS-SDDS et DB2-SDDS : use of a SDDS file by a DBMS and the parallel query processing on the server sites . </li></ul><ul><li>SD-AMOS : appears as a scalable generalisation of a parallel main-memory DBMS where the data partitioning becomes automatic. </li></ul>Conclusion  
  31. 31. <ul><li>Other types of DBMS queries. </li></ul><ul><li>Client's scalable distributed query decomposer. </li></ul><ul><li>challenging appears the design of a scalable distributed query optimizer handling the dynamic data partitioning . </li></ul>Future Work  
  32. 32. End Thank You for Your Attention CERIA Université Paris IX Dauphine [email_address]

×