Building Petabyte Databases SQL+.Net Jim Gray Microsoft research http://research.microsoft.com/~gray/talks VSlive! SQL  To...
SQLserver™: Past and Future History <ul><li>SQL 2000 </li></ul><ul><ul><li>SQL </li></ul></ul><ul><ul><li>XML </li></ul></...
Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></u...
Record Everything? What’s that? <ul><li>Disks will get 100x to 1,000x more capacity </li></ul><ul><ul><li>10x to 30x more ...
Why Put Everything in Cyberspace? Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate proce...
<ul><li>Gordon Bell’s shoebox: </li></ul><ul><li>Scans 20 k “pages” tiff @ 300 dpi   1 GB </li></ul><ul><li>Music: 2 k “ta...
How will we find it? Put everything in the DB (and index it) <ul><li>More than a file system  </li></ul><ul><li>Unifies da...
How do we represent it  to the outside world? <ul><li>File metaphor too primitive: just a blob </li></ul><ul><li>Table met...
There is a problem:  Need  Standard Data AND  Methods <ul><li>XML data is GREAT!!!! </li></ul><ul><ul><li>XML documents ar...
Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></u...
PhotoServer:  Managing Photos <ul><li>Load all photos into the database </li></ul><ul><li>Annotate the photos </li></ul><u...
How  Similarity Search Works <ul><li>For each picture Loader </li></ul><ul><ul><li>Inserts thumbnails </li></ul></ul><ul><...
Things I Learned from PhotoServer <ul><li>Data: </li></ul><ul><ul><li>XML data sets are a universal way to represent answe...
Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></u...
Big!  Servers <ul><li>ScaleUP:  a BIG box </li></ul><ul><ul><li>SMP (32 cpus) </li></ul></ul><ul><ul><li>64 bit </li></ul>...
TPC measures peak performance and Price/Performance <ul><li>SQL Server always had best price Performance </li></ul><ul><li...
Scale Out : Buy Computing by the Slice 709,202 tpmC! ==  1 Billion transactions/day <ul><li>Slice: 8cpu, 8GB, 100 disks (=...
ScaleUp:  A Very Big  System!  <ul><li>UNISYS Windows 2000 Data Center Limited Edition </li></ul><ul><li>32 cpus on  </li>...
Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></u...
TerraServer – A SQL poster child   http://TerraServer.HomeAdvisor.Microsoft.com / <ul><li>3 x 2 TB databases </li></ul><ul...
Image Data <ul><li>All in the database  200x200 pixel tiles  compressed </li></ul><ul><li>Spatial access  z-Tranform Btree...
TerraServer Traffic & Database Growth Jan 2002 SQL 7.0 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 1.0 TB Db ...
Hardware SQLInst1 SQLInst2 SQLInst3 Spare One SQL database per rack Each rack contains 4.5 tb 261 total drives / 13.7 TB t...
TerraServer Lessons Learned <ul><li>Hardware is 5 9’s (with clustering) </li></ul><ul><li>Software  is 5 9’s (with cluster...
TerraService http://TerraService.Net/ <ul><li>Added .NET web services to TerraServer </li></ul><ul><ul><li>A great way to ...
Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></u...
Computational Science  The Third Science Branch is Evolving <ul><li>In the beginning science was  empirical . </li></ul><u...
Exploring Parameter Space Manual or Automatic Data Mining <ul><li>There is LOTS of data  </li></ul><ul><ul><li>people cann...
What’s needed? (not drawn to scale) Scientists Tools Plumbers Databases to Store Data And  Execute Queries Science Data & ...
Some science is hitting a wall FTP  and  GREP  are not adequate <ul><li>You can GREP 1 MB in a second </li></ul><ul><li>Yo...
Web Services are The Key <ul><li>Web SERVER: </li></ul><ul><ul><li>Given a url  + parameters  </li></ul></ul><ul><ul><li>R...
Data Federations of  Web Services <ul><li>Massive datasets live near their owners: </li></ul><ul><ul><li>Near the instrume...
Why Astronomy Data? <ul><li>It has no commercial value </li></ul><ul><ul><li>No privacy concerns </li></ul></ul><ul><ul><l...
Web Services & Grid Enable Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ <ul><li>The I...
Steps to Virtual Observatory Prototype <ul><li>Define a set of Astronomy Objects and methods. </li></ul><ul><ul><li>Based ...
Sloan Digital Sky Survey  http://www.sdss.org/   <ul><li>For the last 12 years astronomers  have been building a telescope...
Demo of Sky Server <ul><li>http://skyserver.sdss.org / </li></ul><ul><li>Demo sky server </li></ul><ul><li>Demo Explorer <...
Two kinds of SDSS data in an SQL DB (objects and images all in DB) <ul><li>15M Photo Objects ~ 400 attributes </li></ul>50...
Spatial Data Access – SQL extension (Szalay, Kunszt, Brunner)  http://www.sdss.jhu.edu/htm <ul><li>Added Hierarchical Tria...
Data Loading <ul><li>JavaScript of DB loader (DTS) </li></ul><ul><li>Web ops interface & workflow system </li></ul><ul><li...
Scenario Design  <ul><li>Astronomers proposed 20 questions </li></ul><ul><ul><li>Typical of things they want to do </li></...
An easy one Q7: Provide a list of rare star-like objects. <ul><li>Found  14,681 buckets,  first 140 buckets have 99%  time...
An Easy One Q15: Find asteroids  <ul><li>Sounds hard but  there are 5 pictures of the object at 5 different times (color f...
<ul><li>Find near earth asteroids: </li></ul><ul><li>Finds 3 objects in 11 minutes </li></ul><ul><ul><li>(or 52 seconds wi...
 
 
 
Performance (on current SDSS data) <ul><li>Run times: on 15k$  COMPAQ  Server  (2 cpu, 1 GB , 8 disk) </li></ul><ul><li>So...
Sequential Scan Speed is Important <ul><li>In high-dimension data, best way is to search. </li></ul><ul><li>Sequential sca...
What we learned from the 20 Queries <ul><li>All have fairly short SQL programs --  a substantial advance over (tcl, C++)  ...
Cosmo:  Computing the Cosmological Constant <ul><li>Compares simulated galaxy distribution  to observed distribution </li>...
Summary <ul><li>We will be able to store everything, </li></ul><ul><ul><li>The challenge is organizing and finding answers...
References <ul><li>These Slides </li></ul><ul><ul><li>http://research.Microsoft.com/~Gray/talks/ </li></ul></ul><ul><li>Te...
Upcoming SlideShare
Loading in …5
×

Databases and Dot.Net "SQL To The Max" keynote

563 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
563
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Databases and Dot.Net "SQL To The Max" keynote

  1. 1. Building Petabyte Databases SQL+.Net Jim Gray Microsoft research http://research.microsoft.com/~gray/talks VSlive! SQL To The Max 15 February 2002 @ San Francisco Objects are closer than they appear in the mirror Objects are closer than they appear in the mirror PhotoServer: Tom Barclay Ya Feng Sung TerraServer Tom Barclay USGS SkyServer Alex Szalay Ani Thakar Peter Kunszt Tanu Malik Jordan Raddick Don Slutz Jan vandenBerg Some Slides Robert Brunner
  2. 2. SQLserver™: Past and Future History <ul><li>SQL 2000 </li></ul><ul><ul><li>SQL </li></ul></ul><ul><ul><li>XML </li></ul></ul><ul><ul><li>Replication x, y, z,… </li></ul></ul><ul><ul><li>Auto Admin </li></ul></ul><ul><ul><li>Data Transformation </li></ul></ul><ul><ul><li>OLAP </li></ul></ul><ul><ul><li>Data Mining </li></ul></ul><ul><ul><li>Text Indexing </li></ul></ul><ul><ul><li>English Query </li></ul></ul><ul><ul><li>Partitioning </li></ul></ul><ul><ul><li>Clusters </li></ul></ul><ul><li>SQL 200x </li></ul><ul><ul><li>Beta late this year </li></ul></ul><ul><ul><li>Trustworthy: Availability Privacy Security </li></ul></ul><ul><ul><li>CLR (objects) </li></ul></ul><ul><ul><li>XML (xQuery,….) </li></ul></ul><ul><ul><li>Unify Files & Records </li></ul></ul><ul><ul><li>Manageability, </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><li>.Net </li></ul><ul><ul><li>XML schema support </li></ul></ul><ul><ul><li>updategrams </li></ul></ul><ul><ul><li>More xPath support </li></ul></ul><ul><ul><li>SPs and templates as web services </li></ul></ul>WebReference.soap proxy = new WebReference.soap(); object[] results1 = proxy.StoredProcedure (inParam, ref inoutParam, out returnValue); object[] results2 = proxy.Template(inParam);
  3. 3. Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></ul><ul><ul><li>How will we find it (aka: who cares?) </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  4. 4. Record Everything? What’s that? <ul><li>Disks will get 100x to 1,000x more capacity </li></ul><ul><ul><li>10x to 30x more bandwidth. </li></ul></ul><ul><li>Other technologies in the wings: </li></ul><ul><ul><li>mram,mems, … </li></ul></ul><ul><li>The 20TB … 200TB disk drive! </li></ul><ul><ul><li>Library of Congress (books) </li></ul></ul><ul><ul><li>A billion photos </li></ul></ul><ul><ul><li>2…20 years of video (continuous) </li></ul></ul>Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book . Movie All Books MultiMedia Everything! Recorded A Photo All LoC books (words) See Mike Lesk: How much information is there : http://www.lesk.com/mlesk/ksg97/ksg.html See Lyman & Varian: How much information http://www.sims.berkeley.edu/research/projects/how-much-info/
  5. 5. Why Put Everything in Cyberspace? Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate processing knowbots Point-to-Point OR Broadcast Immediate OR Time Delayed Locate Process Analyze Summarize
  6. 6. <ul><li>Gordon Bell’s shoebox: </li></ul><ul><li>Scans 20 k “pages” tiff @ 300 dpi 1 GB </li></ul><ul><li>Music: 2 k “tacks” 7 GB </li></ul><ul><li>Photos: 13 k images 2 GB </li></ul><ul><li>Video: 10 hrs 3 GB </li></ul><ul><li>Docs: 3 k ppt, word,.. 2 GB </li></ul><ul><li>Mail: 50 k messages 2 GB </li></ul><ul><ul><li> 16 GB </li></ul></ul>Most storage is personal 90% of disks are IDE/ATA 85% of bytes are
  7. 7. How will we find it? Put everything in the DB (and index it) <ul><li>More than a file system </li></ul><ul><li>Unifies data and meta-data </li></ul><ul><li>Simpler to manage </li></ul><ul><li>Easier to subset and reorganize </li></ul><ul><li>Set-oriented access </li></ul><ul><li>Allows online updates </li></ul><ul><li>Automatic indexing </li></ul><ul><li>Automatic replication </li></ul>SQL SQL
  8. 8. How do we represent it to the outside world? <ul><li>File metaphor too primitive: just a blob </li></ul><ul><li>Table metaphor too primitive: just records </li></ul><ul><li>Need Metadata describing data context </li></ul><ul><ul><li>Format </li></ul></ul><ul><ul><li>Providence (author/publisher/ citations/…) </li></ul></ul><ul><ul><li>Rights </li></ul></ul><ul><ul><li>History </li></ul></ul><ul><ul><li>Related documents </li></ul></ul><ul><li>In a standard format </li></ul><ul><li>XML and XML schema </li></ul><ul><li>DataSet is great example of this </li></ul><ul><li>World is now defining standard schemas </li></ul>schema Data or difgram <ul><li><?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot; ?> </li></ul><ul><li>- < DataSet xmlns =&quot; http://WWT.sdss.org/ &quot;> </li></ul><ul><li>- < xs:schema id =&quot; radec &quot; xmlns =&quot;&quot; xmlns:xs =&quot; http://www.w3.org/2001/XMLSchema &quot; xmlns:msdata =&quot; urn:schemas-microsoft-com:xml-msdata &quot;> </li></ul><ul><li>< xs:element name =&quot; radec &quot; msdata:IsDataSet =&quot; true &quot;> </li></ul><ul><li>< xs:element name =&quot; Table &quot;> </li></ul><ul><ul><li>  < xs:element name =&quot; ra &quot; type =&quot; xs:double &quot; minOccurs =&quot; 0 &quot; /> </li></ul></ul><ul><ul><li>  < xs:element name =&quot; dec &quot; type =&quot; xs:double &quot; minOccurs =&quot; 0 &quot; /> </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li>- < diffgr:diffgram xmlns:msdata =&quot; urn:schemas-microsoft-com:xml-msdata &quot; xmlns:diffgr =&quot; urn:schemas-microsoft-com:xml-diffgram-v1 &quot;> </li></ul></ul><ul><ul><li>- < radec xmlns =&quot;&quot;> </li></ul></ul><ul><ul><li>- < Table diffgr:id =&quot; Table1 &quot; msdata:rowOrder =&quot; 0 &quot;> </li></ul></ul><ul><ul><li>  < ra > 184.028935351008 </ ra > </li></ul></ul><ul><ul><li>  < dec > -1.12590950121524 </ dec > </li></ul></ul><ul><ul><li>  </ Table > </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li>- < Table diffgr:id =&quot; Table10 &quot; msdata:rowOrder =&quot; 9 &quot;> </li></ul></ul><ul><ul><li>  < ra > 184.025719033547 </ ra > </li></ul></ul><ul><ul><li>  < dec > -1.21795827920186 </ dec > </li></ul></ul><ul><ul><li></ Table > </li></ul></ul><ul><ul><li></ radec >   </li></ul></ul><ul><ul><li></ diffgr:diffgram > </li></ul></ul><ul><ul><li></ DataSet > </li></ul></ul>
  9. 9. There is a problem: Need Standard Data AND Methods <ul><li>XML data is GREAT!!!! </li></ul><ul><ul><li>XML documents are portable objects </li></ul></ul><ul><ul><li>XML documents are complex objects </li></ul></ul><ul><ul><li>WSDL defines the methods on objects (the class) </li></ul></ul><ul><li>But will all the implementations match? </li></ul><ul><ul><li>Think of UNIX or SQL or C or… </li></ul></ul><ul><li>We need conformance tests. </li></ul><ul><li>That’s why Web Services Interoperability is so important. http://www.ws-i.org / </li></ul>Niklaus Wirth: Algorithms + Data Structures = Programs
  10. 10. Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></ul><ul><ul><li>How will we find it (aka: who cares?) </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  11. 11. PhotoServer: Managing Photos <ul><li>Load all photos into the database </li></ul><ul><li>Annotate the photos </li></ul><ul><li>View by various attributes </li></ul><ul><li>Do similarity Search </li></ul><ul><li>Use XML for interchange </li></ul><ul><li>Use dbObject, Template for access </li></ul>SQL (for xml) Templates Schema DOM SQL, Templates, XML data XML datasets & mime data IIS jScript
  12. 12. How Similarity Search Works <ul><li>For each picture Loader </li></ul><ul><ul><li>Inserts thumbnails </li></ul></ul><ul><ul><li>Extracts 270 Features into a blob </li></ul></ul><ul><li>When looking for similar picture </li></ul><ul><ul><li>Scan all photos comparing features (dot product of vectors) </li></ul></ul><ul><ul><li>Sort by similarity </li></ul></ul><ul><li>Feature blob is an array </li></ul><ul><ul><li>Today I fake the array with functions and cast cast ( substring (feature,72,8) as float ) </li></ul></ul><ul><ul><li>When SQL Server gets C#, we won’t have to fake it. </li></ul></ul><ul><ul><li>And… it will run 100x faster (compiled managed code). </li></ul></ul><ul><li>Idea pioneered by IBM Research, we use a variant by MS Beijing Research. </li></ul>No black squares 20% orange … etc many black squares 10% orange … etc 72% match 27% match
  13. 13. Things I Learned from PhotoServer <ul><li>Data: </li></ul><ul><ul><li>XML data sets are a universal way to represent answers </li></ul></ul><ul><ul><li>XML data sets minimize round trips: 1 request/response </li></ul></ul><ul><li>Search </li></ul><ul><ul><li>It is BEST to index </li></ul></ul><ul><ul><li>You can put objects and attributes in a row (SQL puts big blobs off-page) </li></ul></ul><ul><ul><li>If you can’t index, You can extract attributes and quickly compare </li></ul></ul><ul><ul><li>SQL can scan at 2M records/cpu/second </li></ul></ul><ul><ul><li>Sequential scans are embarrassingly parallel. </li></ul></ul>
  14. 14. Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></ul><ul><ul><li>How will we find it (aka: who cares?) </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  15. 15. Big! Servers <ul><li>ScaleUP: a BIG box </li></ul><ul><ul><li>SMP (32 cpus) </li></ul></ul><ul><ul><li>64 bit </li></ul></ul><ul><li>ScaleOut: computing by the slice </li></ul><ul><ul><li>6 years ago: 8ktpmC, today 750ktpmC </li></ul></ul><ul><ul><li>SQL Server is #1, #2, #3 (Windows is best DB2 platform too) </li></ul></ul><ul><li>VLDB Management </li></ul><ul><li>Availability: </li></ul><ul><ul><li>Clusters, remote logging, replication </li></ul></ul>
  16. 16. TPC measures peak performance and Price/Performance <ul><li>SQL Server always had best price Performance </li></ul><ul><li>Now best of both (using scaleout) </li></ul><ul><li>SMP performance also impressive </li></ul>32x8 900Mhz Xenon 256GB ram 59 TB disk 32 900Mhz Xeon 64GB ram 15TB disk Source: http://www.tpc.org/tpcc/results/tpcc_perf_results.asp 09/19/01  Microsoft COM+   Microsoft Windows 2000 Datacenter  LE  Microsoft SQL Server 2000 Enterprise   21.33 US$ 165,219 e-@ction Enterprise Server ES7000 Unisys 14 12/21/01 BEA Tuxedo6.4 HP UX 11.i 64-bit Oracle 9i Enterprise 21.24 US$ 389,435 HP 9000 Superdome HP 7 09/19/01  Microsoft COM+ Microsoft Windows 2000 Advanced   Microsoft SQL Server 2000 Enterprise   14.04 US$ 567,882 ProLiant DL760-900-192P   3  04/10/01  Microsoft COM+ Microsoft Windows 2000 Datacenter   Microsoft SQL Server 2000   22.58 US$ 688,220 IBM eSeries370 c/s   2  09/19/01  Microsoft COM+ Microsoft Windows 2000 Advanced   Microsoft SQL Server 2000 Enterprise 14.96 US$ 709,220 ProLiant DL760-900-256P   1  Date TP Mon OS Database price/tpmC tpmC System Company Rank
  17. 17. Scale Out : Buy Computing by the Slice 709,202 tpmC! == 1 Billion transactions/day <ul><li>Slice: 8cpu, 8GB, 100 disks (=1.8TB) 20ktpmC per slice, ~300k$/slice </li></ul><ul><li>clients and 4 DTC nodes not shown </li></ul>
  18. 18. ScaleUp: A Very Big System! <ul><li>UNISYS Windows 2000 Data Center Limited Edition </li></ul><ul><li>32 cpus on </li></ul><ul><li>32 GB of RAM and </li></ul><ul><li>1,061 disks (15.5 TB) </li></ul><ul><li>Will be helped by 64bit addressing </li></ul>24 fiber channel
  19. 19. Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></ul><ul><ul><li>How will we find it (aka: who cares?) </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  20. 20. TerraServer – A SQL poster child http://TerraServer.HomeAdvisor.Microsoft.com / <ul><li>3 x 2 TB databases </li></ul><ul><li>18TB disk tri-plexed (=6TB) </li></ul><ul><li>3 + 1 Cluster </li></ul><ul><li>99.96% uptime </li></ul><ul><li>1B page views 5B DB queries </li></ul><ul><li>Now a .NET web service </li></ul>
  21. 21. Image Data <ul><li>All in the database 200x200 pixel tiles compressed </li></ul><ul><li>Spatial access z-Tranform Btree </li></ul>USGS Aerial photos “DOQ” USGS Topo Maps Encarta Virtual Globe 1 Km resolution 100 % World Coverage 12 TB 95 % U.S. Coverage 1 m resolution 1 TB 100% U.S. Coverage 2 m resolution
  22. 22. TerraServer Traffic & Database Growth Jan 2002 SQL 7.0 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 2.0 TB Db SQL 2000 2.0 TB Db SQL 2000 2.0 TB Db 1 Server / Win NT 4.0 EE 2 nd Server / Win 2k DataCenter 4 Node / Win2k Datacenter Failover Cluster 678 m Rows 900 m Rows SQL 7.0 1.0 TB Db 217 m Rows SQL 7.0 1 Server 1.5 TB Db SQL 2000 1 Server .8 TB Db 298 m Rows SQL 7.0 .75 TB Db 173 m Rows SQL 2000 .8 TB Db 231 m Rows Sessions Page Views Image Tiles Db Queries Bytes Xfered Average Day 44,320 879,720 3,786,551 4,566,024 59 GB Peak Day 277,292 12,388,104 10,475,674 163 GB 2,401,209 1998 -2001 44,851,547 890,277087 3,831,989,887 4,620,815,913 59 TB
  23. 23. Hardware SQLInst1 SQLInst2 SQLInst3 Spare One SQL database per rack Each rack contains 4.5 tb 261 total drives / 13.7 TB total Meta Data Stored on 101 GB “ Fast, Small Disks” (18 x 18.2 GB) Imagery Data Stored on 4 339 GB “ Slow, Big Disks” (15 x 73.8 GB) To Add 90 72.8 GB Disks in Feb 2001 to create 18 TB SAN 8 Compaq DL360 “Photon” Web Servers Fiber SAN Switches 4 Compaq ProLiant 8500 Db Servers F G L K P Q E E J J O O I H M N R S 2200 2200 2200 2200 2200 2200 2200 2200 2200
  24. 24. TerraServer Lessons Learned <ul><li>Hardware is 5 9’s (with clustering) </li></ul><ul><li>Software is 5 9’s (with clustering) </li></ul><ul><li>Admin is 4 9’s (offline maintenance) </li></ul><ul><li>Network is 3 9’s (mistakes, environment) </li></ul><ul><li>Simple designs are best </li></ul><ul><li>10 TB DB is management limit 1 PB = 100 x 10 TB DB this is 100x better than 5 years ago. </li></ul><ul><li>Minimize use of tape </li></ul><ul><ul><li>Backup to disk (snapshots) </li></ul></ul><ul><ul><li>Portable disk TBs </li></ul></ul>9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
  25. 25. TerraService http://TerraService.Net/ <ul><li>Added .NET web services to TerraServer </li></ul><ul><ul><li>A great way to learn what Web Services are </li></ul></ul><ul><ul><li>And what .Net is. </li></ul></ul><ul><li>Image server </li></ul><ul><ul><li>Gives arbitrary rectangle/zoom of US </li></ul></ul><ul><ul><li>Overlays features (hospitals, schools,..) </li></ul></ul><ul><li>Census service </li></ul><ul><li>You can use it in your app. </li></ul><ul><li>USDA is using it today. </li></ul>Demo Tour API Demo map maker Mention location and census services
  26. 26. Outline <ul><li>We will be able to store everything, </li></ul><ul><ul><li>How do we represent it? (objects) </li></ul></ul><ul><ul><li>How will we find it (aka: who cares?) </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  27. 27. Computational Science The Third Science Branch is Evolving <ul><li>In the beginning science was empirical . </li></ul><ul><li>Then theoretical branches evolved. </li></ul><ul><li>Now, we have computational branches. </li></ul><ul><ul><li>Has primarily been simulation </li></ul></ul><ul><ul><li>Growth area data analysis/visualization of peta-scale instrument data . </li></ul></ul><ul><li>Computational Science </li></ul><ul><ul><li>Data captured by instruments Or data generated by simulator </li></ul></ul><ul><ul><li>Processed by software </li></ul></ul><ul><ul><li>Placed in a database / files </li></ul></ul><ul><ul><li>Scientist analyzes database / files </li></ul></ul>
  28. 28. Exploring Parameter Space Manual or Automatic Data Mining <ul><li>There is LOTS of data </li></ul><ul><ul><li>people cannot examine most of it. </li></ul></ul><ul><ul><li>Need computers to do analysis. </li></ul></ul><ul><li>Manual or Automatic Exploration </li></ul><ul><ul><li>Manual : person suggests hypothesis, computer checks hypothesis </li></ul></ul><ul><ul><li>Automatic : Computer suggests hypothesis person evaluates significance </li></ul></ul><ul><li>Given an arbitrary parameter space: </li></ul><ul><ul><li>Data Clusters </li></ul></ul><ul><ul><li>Points between Data Clusters </li></ul></ul><ul><ul><li>Isolated Data Clusters </li></ul></ul><ul><ul><li>Isolated Data Groups </li></ul></ul><ul><ul><li>Holes in Data Clusters </li></ul></ul><ul><ul><li>Isolated Points </li></ul></ul>Nichol et al. 2001 Slide courtesy of and adapted from Robert Brunner @ CalTech.
  29. 29. What’s needed? (not drawn to scale) Scientists Tools Plumbers Databases to Store Data And Execute Queries Science Data & Questions Question & Answer Visualization Data Mining Algorithms Miners
  30. 30. Some science is hitting a wall FTP and GREP are not adequate <ul><li>You can GREP 1 MB in a second </li></ul><ul><li>You can GREP 1 GB in a minute </li></ul><ul><li>You can GREP 1 TB in 2 days </li></ul><ul><li>You can GREP 1 PB in 3 years. </li></ul><ul><li>Oh!, and 1PB ~10,000 disks </li></ul><ul><li>At some point you need indices to limit search parallel data search and analysis </li></ul><ul><li>This is where databases can help </li></ul><ul><li>Goal Make it easy to </li></ul><ul><ul><li>Publish: Record structured data </li></ul></ul><ul><ul><li>Find: Find data anywhere in the network </li></ul></ul><ul><ul><ul><li>Get the subset you need </li></ul></ul></ul><ul><ul><li>Explore datasets interactively </li></ul></ul><ul><li>You can FTP 1 MB in 1 sec </li></ul><ul><li>You can FTP 1 GB / min (= 1 $/GB) </li></ul><ul><li>… 2 days and 1K$ </li></ul><ul><li>… 3 years and 1M$ </li></ul>
  31. 31. Web Services are The Key <ul><li>Web SERVER: </li></ul><ul><ul><li>Given a url + parameters </li></ul></ul><ul><ul><li>Returns a web page (often dynamic) </li></ul></ul><ul><li>Web SERVICE: </li></ul><ul><ul><li>Given a XML document (soap msg) </li></ul></ul><ul><ul><li>Returns an XML document </li></ul></ul><ul><ul><li>Tools make this look like an RPC. </li></ul></ul><ul><ul><ul><li>F(x,y,z) returns (u, v, w) </li></ul></ul></ul><ul><ul><li>Distributed objects for the web. </li></ul></ul><ul><ul><li>+ naming, discovery, security,.. </li></ul></ul><ul><li>Internet-scale distributed computing </li></ul>Your program Data In your address space Web Service soap object in xml Your program Web Server http Web page
  32. 32. Data Federations of Web Services <ul><li>Massive datasets live near their owners: </li></ul><ul><ul><li>Near the instrument’s software pipeline </li></ul></ul><ul><ul><li>Near the applications </li></ul></ul><ul><ul><li>Near data knowledge and curation </li></ul></ul><ul><ul><li>Super Computer centers become Super Data Centers </li></ul></ul><ul><li>Each Archive publishes a web service </li></ul><ul><ul><li>Schema: documents the data </li></ul></ul><ul><ul><li>Methods on objects (queries) </li></ul></ul><ul><li>Scientists get “personalized” extracts </li></ul><ul><li>Uniform access to multiple Archives </li></ul><ul><ul><li>A common global schema </li></ul></ul>Federation
  33. 33. Why Astronomy Data? <ul><li>It has no commercial value </li></ul><ul><ul><li>No privacy concerns </li></ul></ul><ul><ul><li>Can freely share results with others </li></ul></ul><ul><ul><li>Great for experimenting with algorithms </li></ul></ul><ul><li>It is real and well documented </li></ul><ul><ul><li>High-dimensional data (with confidence intervals) </li></ul></ul><ul><ul><li>Spatial data </li></ul></ul><ul><ul><li>Temporal data </li></ul></ul><ul><li>Many different instruments from Many different places and Many different times </li></ul><ul><li>Federation is a goal </li></ul><ul><li>The questions are interesting </li></ul><ul><ul><li>How did the universe form? </li></ul></ul><ul><li>There is a lot of it (petabytes) </li></ul>IRAS 100  ROSAT ~keV DSS Optical 2MASS 2  IRAS 25  NVSS 20cm WENSS 92cm GB 6cm
  34. 34. Web Services & Grid Enable Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ <ul><li>The Internet will be the world’s best telescope: </li></ul><ul><ul><li>It has data on every part of the sky </li></ul></ul><ul><ul><li>In every measured spectral band: optical, x-ray, radio.. </li></ul></ul><ul><ul><li>As deep as the best instruments (2 years ago). </li></ul></ul><ul><ul><li>It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). </li></ul></ul><ul><ul><li>It’s a smart telescope : links objects and data to literature on them. </li></ul></ul><ul><li>W3C & IETF standards Provide </li></ul><ul><ul><li>Naming </li></ul></ul><ul><ul><li>Authorization / Security / Privacy </li></ul></ul><ul><ul><li>Distributed Objects </li></ul></ul><ul><ul><ul><li>Discovery, Definition, Invocation, Object Model </li></ul></ul></ul><ul><ul><li>Higher level services: workflow, transactions, DB,.. </li></ul></ul>
  35. 35. Steps to Virtual Observatory Prototype <ul><li>Define a set of Astronomy Objects and methods. </li></ul><ul><ul><li>Based on UDDI, WSDL, XSL, SOAP, dataSet </li></ul></ul><ul><li>Use them locally to debug ideas </li></ul><ul><ul><li>Schema, Units,… </li></ul></ul><ul><ul><li>Dataset problems </li></ul></ul><ul><ul><li>Typical use scenarios. </li></ul></ul><ul><li>Federate different archives </li></ul><ul><ul><li>Each archive is a web service </li></ul></ul><ul><ul><li>Global query tool accesses them </li></ul></ul><ul><li>Working on this plan with </li></ul><ul><ul><li>Sloan Digital Sky Survey and CalTech/Palomar. Especially Alex Szalay et. al. at JHU </li></ul></ul>
  36. 36. Sloan Digital Sky Survey http://www.sdss.org/ <ul><li>For the last 12 years astronomers have been building a telescope (with funding from Sloan Foundation, NSF, and a dozen universities). 90M$. </li></ul><ul><li>Y2000: engineer, calibrate, commission: now public data. </li></ul><ul><ul><li>5% of the survey, 600 sq degrees, 15 M objects 60GB, ½ TB raw. </li></ul></ul><ul><ul><li>This data includes most of the known high z quasars. </li></ul></ul><ul><ul><li>It has a lot of science left in it but…. </li></ul></ul><ul><li>New the data is arriving: </li></ul><ul><ul><li>250GB/nite (20 nights per year) = 5TB/y. </li></ul></ul><ul><ul><li>100 M stars, 100 M galaxies, 1 M spectra. </li></ul></ul><ul><li>http://www.sdss.org/ </li></ul>
  37. 37. Demo of Sky Server <ul><li>http://skyserver.sdss.org / </li></ul><ul><li>Demo sky server </li></ul><ul><li>Demo Explorer </li></ul><ul><li>Explain need for Unix/Mac clients </li></ul><ul><li>Demo Java SQLQA? </li></ul><ul><li>Talk about federation plan. </li></ul><ul><li>Work is product of Alex Szalay @ Johns Hopkins Tanu Malik did SQLQA. </li></ul>
  38. 38. Two kinds of SDSS data in an SQL DB (objects and images all in DB) <ul><li>15M Photo Objects ~ 400 attributes </li></ul>50K Spectra with ~30 lines/ spectrum
  39. 39. Spatial Data Access – SQL extension (Szalay, Kunszt, Brunner) http://www.sdss.jhu.edu/htm <ul><li>Added Hierarchical Triangular Mesh (HTM) table-valued function for spatial joins. </li></ul><ul><li>Every object has a 20-deep Mesh ID. </li></ul><ul><li>Given a spatial definition: Routine returns up to ~10 covering triangles. </li></ul><ul><li>Spatial query is then up to ~10 range queries. </li></ul><ul><li>Very fast: 10,000 triangles / second / cpu. </li></ul><ul><li>Based onSQL Server Extended Stored Procedure </li></ul>2
  40. 40. Data Loading <ul><li>JavaScript of DB loader (DTS) </li></ul><ul><li>Web ops interface & workflow system </li></ul><ul><li>Data ingest and scrubbing is major effort </li></ul><ul><ul><li>Test data quality </li></ul></ul><ul><ul><li>Chase down bugs / inconsistencies </li></ul></ul><ul><li>Other major task is data documentation </li></ul><ul><ul><li>Explain the data </li></ul></ul><ul><ul><li>Explain the schema and functions. </li></ul></ul><ul><li>If we supported users, … </li></ul>
  41. 41. Scenario Design <ul><li>Astronomers proposed 20 questions </li></ul><ul><ul><li>Typical of things they want to do </li></ul></ul><ul><ul><li>Each would require a week of programming in tcl / C++/ FTP </li></ul></ul><ul><li>Goal, make it easy to answer questions </li></ul><ul><li>DB and tools design motivated by this goal </li></ul><ul><ul><li>Implemented utility procedures </li></ul></ul><ul><ul><li>JHU Built GUI for Linux clients </li></ul></ul>Q11: Find all elliptical galaxies with spectra that have an anomalous emission line. Q12: Create a grided count of galaxies with u-g>1 and r<21.5 over 60<declination<70, and 200<right ascension<210, on a grid of 2’, and create a map of masks over the same grid. Q13: Create a count of galaxies for each of the HTM triangles which satisfy a certain color cut, like 0.7u-0.5g-0.2i<1.25 && r<21.75, output it in a form adequate for visualization. Q14: Find stars with multiple measurements and have magnitude variations >0.1. Scan for stars that have a secondary object (observed at a different time) and compare their magnitudes. Q15: Provide a list of moving objects consistent with an asteroid. Q16: Find all objects similar to the colors of a quasar at 5.5<redshift<6.5. Q17: Find binary stars where at least one of them has the colors of a white dwarf. Q18: Find all objects within 30 arcseconds of one another that have very similar colors: that is where the color ratios u-g, g-r, r-I are less than 0.05m. Q19: Find quasars with a broad absorption line in their spectra and at least one galaxy within 10 arcseconds. Return both the quasars and the galaxies. Q20: For each galaxy in the BCG data set (brightest color galaxy), in 160<right ascension<170, -25<declination<35 count of galaxies within 30&quot;of it that have a photoz within 0.05 of that galaxy. Q1: Find all galaxies without unsaturated pixels within 1' of a given point of ra=75.327, dec=21.023 Q2: Find all galaxies with blue surface brightness between and 23 and 25 mag per square arcseconds, and -10<super galactic latitude (sgb) <10, and declination less than zero. Q3: Find all galaxies brighter than magnitude 22, where the local extinction is >0.75. Q4: Find galaxies with an isophotal surface brightness (SB) larger than 24 in the red band, with an ellipticity>0.5, and with the major axis of the ellipse having a declination of between 30” and 60”arc seconds. Q5: Find all galaxies with a deVaucouleours profile (r ¼ falloff of intensity on disk) and the photometric colors consistent with an elliptical galaxy. The deVaucouleours profile Q6: Find galaxies that are blended with a star, output the deblended galaxy magnitudes. Q7: Provide a list of star-like objects that are 1% rare. Q8: Find all objects with unclassified spectra. Q9: Find quasars with a line width >2000 km/s and 2.5<redshift<2.7. Q10: Find galaxies with spectra that have an equivalent width in Ha >40Å (Ha is the main hydrogen spectral line.)
  42. 42. An easy one Q7: Provide a list of rare star-like objects. <ul><li>Found 14,681 buckets, first 140 buckets have 99% time 62 seconds </li></ul><ul><li>CPU bound 226 k records/second (2 cpu) 250 KB/s. </li></ul>Select cast ((u-g) as int) as ug, cast ((g-r) as int) as gr, cast ((r-i) as int) as ri, cast ((i-z) as int) as iz, count(*) as Population from stars group by cast ((u-g) as int), cast ((g-r) as int), cast ((r-i) as int), cast ((i-z) as int) order by count(*)
  43. 43. An Easy One Q15: Find asteroids <ul><li>Sounds hard but there are 5 pictures of the object at 5 different times (color filters) and so can “see” velocity. </li></ul><ul><li>Image pipeline computes velocity. </li></ul><ul><li>Computing it from the 5 color x,y would also be fast </li></ul><ul><li>Finds 1,303 objects in 3 minutes, 140MBps. (could go 2x faster with more disks) </li></ul>select objId, dbo.fGetUrlEq(ra,dec) as url --return object ID & url sqrt ( power (rowv,2)+ power (colv,2)) as velocity from photoObj -- check each object. where ( power (rowv,2) + power (colv, 2)) -- square of velocity between 50 and 1000 -- huge values =error
  44. 44. <ul><li>Find near earth asteroids: </li></ul><ul><li>Finds 3 objects in 11 minutes </li></ul><ul><ul><li>(or 52 seconds with an index) </li></ul></ul><ul><li>Ugly, but consider the alternatives (c programs an files and…) </li></ul>Q15: Fast Moving Objects <ul><ul><li>SELECT r.objID as rId, g.objId as gId, </li></ul></ul><ul><ul><li> dbo.fGetUrlEq(g.ra, g.dec) as url </li></ul></ul><ul><ul><li>FROM PhotoObj r, PhotoObj g </li></ul></ul><ul><ul><li>WHERE r.run = g.run and r.camcol=g.camcol and abs(g.field-r.field)<2 -- nearby </li></ul></ul><ul><ul><li>-- the red selection criteria </li></ul></ul><ul><ul><li>and ((power(r.q_r,2) + power(r.u_r,2)) > 0.111111 ) </li></ul></ul><ul><ul><li>and r.fiberMag_r between 6 and 22 and r.fiberMag_r < r.fiberMag_g and r.fiberMag_r < r.fiberMag_i </li></ul></ul><ul><ul><li>and r.parentID=0 and r.fiberMag_r < r.fiberMag_u and r.fiberMag_r < r.fiberMag_z </li></ul></ul><ul><ul><li>and r.isoA_r/r.isoB_r > 1.5 and r.isoA_r>2.0 </li></ul></ul><ul><ul><li>-- the green selection criteria </li></ul></ul><ul><ul><li>and ((power(g.q_g,2) + power(g.u_g,2)) > 0.111111 ) </li></ul></ul><ul><ul><li>and g.fiberMag_g between 6 and 22 and g.fiberMag_g < g.fiberMag_r and g.fiberMag_g < g.fiberMag_i </li></ul></ul><ul><ul><li>and g.fiberMag_g < g.fiberMag_u and g.fiberMag_g < g.fiberMag_z </li></ul></ul><ul><ul><li>and g.parentID=0 and g.isoA_g/g.isoB_g > 1.5 and g.isoA_g > 2.0 </li></ul></ul><ul><ul><li>-- the matchup of the pair </li></ul></ul><ul><ul><li>and sqrt(power(r.cx -g.cx,2)+ power(r.cy-g.cy,2)+power(r.cz-g.cz,2))*(10800/PI())< 4.0 </li></ul></ul><ul><ul><li>and abs(r.fiberMag_r-g.fiberMag_g)< 2.0 </li></ul></ul>
  45. 48. Performance (on current SDSS data) <ul><li>Run times: on 15k$ COMPAQ Server (2 cpu, 1 GB , 8 disk) </li></ul><ul><li>Some take 10 minutes </li></ul><ul><li>Some take 1 minute </li></ul><ul><li>Median ~ 22 sec. </li></ul><ul><li>Ghz processors are fast! </li></ul><ul><ul><li>(10 mips/IO, 200 ins/byte) </li></ul></ul><ul><ul><li>2.5 m rec/s/cpu </li></ul></ul>~1,000 IO/cpu sec ~ 64 MB IO/cpu sec
  46. 49. Sequential Scan Speed is Important <ul><li>In high-dimension data, best way is to search. </li></ul><ul><li>Sequential scan covering index is 10x faster </li></ul><ul><ul><li>Seconds vs minutes </li></ul></ul><ul><li>SQL scans at 2M records/s/cpu (!) </li></ul>
  47. 50. What we learned from the 20 Queries <ul><li>All have fairly short SQL programs -- a substantial advance over (tcl, C++) </li></ul><ul><li>Many are sequential one-pass and two-pass over data </li></ul><ul><li>Covering indices make scans run fast </li></ul><ul><li>Table valued functions are wonderful but limitations are painful. </li></ul><ul><li>Counting, Binning, Histograms VERY common </li></ul><ul><li>Spatial indices helpful, </li></ul><ul><li>Materialized view (Neighbors) helpful. </li></ul>
  48. 51. Cosmo: Computing the Cosmological Constant <ul><li>Compares simulated galaxy distribution to observed distribution </li></ul><ul><li>Measure distance between each pair of galaxies A lot of work  (10 8 x 10 8 = 10 16 steps) </li></ul><ul><li>Good algorithms make this ~Nlog 2 N </li></ul><ul><li>Needs LARGE main memory </li></ul><ul><li>Using Itanium donated by Compaq and SQL server for data store </li></ul><ul><li>(this is Alex Szalay, Adrian Pope,… of JHU). </li></ul>year decade week day month
  49. 52. Summary <ul><li>We will be able to store everything, </li></ul><ul><ul><li>The challenge is organizing and finding answers. </li></ul></ul><ul><li>PhotoServer: Objects vs records vs files, </li></ul><ul><ul><li>XML++ gives us portable objects. </li></ul></ul><ul><ul><li>Similarity search: better than nothing! </li></ul></ul><ul><li>Scalability: a solved problem, </li></ul><ul><ul><li>but… Trustworthy & Manageable is not. </li></ul></ul><ul><li>TerraServer and TerraService </li></ul><ul><ul><li>Why put everything in the database? </li></ul></ul><ul><ul><li>A prototypical Web Service. </li></ul></ul><ul><li>SkyServer and the World Wide Telescope </li></ul><ul><ul><li>Data Mining science data </li></ul></ul><ul><ul><li>Serving Windows/Macintosh/Unix clients with .Net </li></ul></ul><ul><ul><li>Federating Archives with .Net </li></ul></ul>
  50. 53. References <ul><li>These Slides </li></ul><ul><ul><li>http://research.Microsoft.com/~Gray/talks/ </li></ul></ul><ul><li>TerraServer & TerraService </li></ul><ul><ul><li>http://terraService.Net/ </li></ul></ul><ul><li>Virtual Observatory (aka World Wide Telescope) </li></ul><ul><ul><li>http://www.voforum.org/ </li></ul></ul><ul><li>SkyServer </li></ul><ul><ul><li>http://SkyServer.SDSS.org/ </li></ul></ul><ul><ul><li>See documents at http://SkyServer.SDSS.org/en/help/download/ </li></ul></ul><ul><li>Download “personal SkyServer” (1GB) </li></ul><ul><ul><li>http://research.Microsoft.com/~Gray/sdss/ </li></ul></ul>

×