Organization is Sharing:From eScience toPersonal Information ManagementRodrigo Dias Arruda SenraAdvisor: Profa Dra. Claudi...
Outline• Motivation• Objectives• Contributions• Results2• SciFrame• Database Descriptors• Organographs{
Motivation
4Study the relationHeterogeneity ↔ Organization ↔ Sharing
5NDVI Profile GenerationPostGISFilesystemPostgresWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTML, Microformats, 2...
Objectives
8
• describe and compare eScience systems• match Applications needs with DBMS capabilities• manage digital content hierarchi...
MotivationObjectives• Contributions• Results9• SciFrame• Database Descriptors• Organographs{
SciFrame
11SciFrameThe Scientific Digital Data Processing Framework is aconceptual framework that describes systems orprocesses invo...
InterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementDat...
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementDat...
SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementInf...
SciFrameInterfacingAcquisitionDiscoveryExtractionTransferencePublicationData ManagementStorageManipulationInformation Mana...
WebMapsInterfacingAcquisitionDiscovery Geometries (IBGE), Raster(NASA), Crops(Min.Agr)Extraction ad hoc extractor scripts ...
Research ProblemsInterfacingAcquisitionDiscovery data scattered, many providers, search engines ?Extraction feasibility, p...
TechnologiesInterfacingAcquisitionDiscovery DAS Registry, BIOCatalogue, SciScopeExtraction Scrappers,Wrappers, PiggyBank, ...
InterfacingAcquisitionPublication(discovery - extraction - transference )Information ManagementData Management
Data Management
Data Management
Data Management✓enforce loose coupling between Apps and DBMS✓DBMS product/vendor independence✓seamless cross-database migr...
DatabaseDescriptors
DBMSDescriptorsFeature descriptorDesiderata descriptorspecifies what a client application needs12App
DBMSDescriptorsFeature descriptorDesiderata descriptorspecifies what a client application needsspecifies what a DBMS provide...
Architecture15WebDMS XDMS YDMS Z
Architecture15WebDMS XDMS YDMS ZDescriptorRegistrydescriptor XdescriptorY
Architecture15WebDMS XDMS YDMS ZDescriptorRegistryDescriptorRegistryDescriptorRegistryDescriptorRegistrydescriptor Xdescri...
Architecture15WebDMS XDMS YDMS ZDescriptorRegistryDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescriptor Xdes...
Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescr...
Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescr...
Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescr...
DBD Structure13 * http://dublincore.org/documents/dces/AppDBMS
@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc...
@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc...
Understanding Hierarchies...SciFrame DBDs
Organographs
27
28Which of the following sets betteraccommodate the object above ?
29Red ? Triangles ? Metric Related ?
Problems301. Single Category versus Multi-faceted Content2. Manually-defined categories3.Criteria is not explicit4.Static M...
31
31Organograph... artifact to make explicit how to organizeinformation in the context of a particular task.
Organograph32Hout = forg(Hin)vcnteaggecntH(V,E)vaggvagg
Organograph32Hout = forg(Hin)forg:• navigation (crawler/iterador)• feature extraction• FHil(vagg,vagg): hierarchical struc...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
Metodology34collection
Metodology34collectionorganize
Metodology34collectionorganizeevaluate
Metodology34collectionorganizeevaluatereorganize
Metodology34collectionorganizeevaluatereorganizeshare
Evaluating Hierarchies35
Evaluating Hierarchies35too much content
Evaluating Hierarchies35too much contentduplicated or misplaced
Evaluating Hierarchies35too much contenttoo manyaggregatorsduplicated or misplaced
Evaluating Hierarchies35too much contenttoo manyaggregatorsduplicated or misplacedtoo deep
Reorganizing Hierarchies36AliceBob201120082011AuthorPublication Datepaper 1paper 2paper 3
Reorganizing Hierarchies36AliceBob201120082011AuthorPublication Date AuthorPublication Datepaper 1paper 2paper 3
Reorganizing Hierarchies36AliceBob201120082011 AliceBob20082011AliceAuthorPublication Date AuthorPublication DateTask is i...
Reuse Organization37
Reuse Organization37
Reuse Organization37Hacm Vcntmine
HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
@organographdef forg_ccs98(self, input):self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’self.description = ‘do...
@organographdef forg_ccs98(self, input):self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’self.description = ‘do...
forg_ccs_98InterfacingAcquisitionDiscovery ACM CCS98, HinExtraction pdf2txt,pdfbox, pypdf; NLTK (tokenizer)Transference HT...
Related Work
Related Work (SciFrame)• CLRC scientific metadata modelB. Matthews and S. SufiThe CLRC Scientific Metadata Model, version 1...
Related Work (DBDs)Madnick and Wang.EvolutionTowards Strategic Applications Of DatabasesThroughComposite Information Syste...
Related Work (Organographs)•Topic ModelingLSA, LDA, Hierarchical BayesianBlei 201; Blei, Ng, & Jordan, 2003; Griffiths & St...
Results
Contributions• SciFrame• Database Descriptors (DBDs)• Organographs• Software tools & algorithms:WebMAPS, Paparazzi & Organ...
Publicationssubmitted toJODSEvaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Clau...
Publicationssubmitted toJODSEvaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Clau...
ExtensionsTheoretical PracticalSciFrame • formalize design pattern• enhance the operations vocabulary• online catalog of e...
Agradecimentos• Laboratório de Sistemas de Informação (IC-Unicamp)http://www.lis.ic.unicamp.br• Brazilian Institute for We...
Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.brrsenra@acm.org
Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.brrsenra@acm.orgThank you.Agradeço sua atenção.
Support Material
Hierarquiade Origem
Hierarquiade OrigemPre-processamentoBeautifulSouppyPdf
Hierarquiade OrigemExtraçãoNLTKPre-processamentoBeautifulSouppyPdf
Hierarquiade OrigemExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongo
Hierarquiade OrigemWorkflow de TransformaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongo
Hierarquiade OrigemWorkflow de TransformaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx ...
Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdf...
Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdf...
Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoNavegação daHierarquiaIteradorExtraçãoNLTKPre-p...
Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoNavegação daHierarquiaIteradorExtraçãoNLTKPre-p...
Hin HoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowFCat()FHil()Visualization
NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategi...
55forg:• navigation (crawler/iterador)• feature extraction• FHil(vagg,vagg): hierarchical structuring• FCat(vagg,vcnt): ca...
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements...
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements...
58NDVI Profiles
Data ManagementManipulationCreateRetrieveUpdateDeleteIndexStorage
Information ManagementTransformations‣Browsing‣Iterating‣Searching‣ Augmenting‣Mining‣Description‣Annotation‣ Schematizati...
Example61
Example62Input CollectionTask: info extractionTask: transformationTask: visualization
63WebMAPS: DataFlowCorreioFTPMODIS Reprojection ToolImagensRecorteda regiãoGeometria(IBGE)‫‏‬
64NDVI
Related Work9• embedded• n-tier client/server (including web services)• mediatorsApproaches to App-to-DMS bindingInformati...
Related Work9• embedded• n-tier client/server (including web services)• mediatorsDescriptors are orthogonal to all of thes...
66Extração dos Dados Sensoriasdataset = gdal.Open(raster_file, GA_ReadOnly )‫‏‬# Obtenção dos coeficientes para funções af...
67WebMAPS
Case Study:WebMaps
Case Study:WebMaps
69Extração dos Dadosdef raster2array(ul_pixel, lr_pixel, dtype=B):"""Using ul_pixel and lr_pixel it generates a numpy arra...
70Extração da Geometriashp = ogr.Open(filepath)‫‏‬# Layer correspondente ao Estado de São paulolayer = vf.shp.GetLayerByNa...
71Operações Espaciais
Organicer72
Organicer72
Organicer72
Organicer72
Organicer72
Tese phd
Tese phd
Upcoming SlideShare
Loading in …5
×

Tese phd

692 views

Published on

My PhD thesis presentation

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
692
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tese phd

  1. 1. Organization is Sharing:From eScience toPersonal Information ManagementRodrigo Dias Arruda SenraAdvisor: Profa Dra. Claudia Bauzer MedeirosDefesa de Tese de Doutorado em Ciência da ComputaçãoUniversidade Estadual de CampinasInstituto de ComputaçãoCampinas 2012-12-10
  2. 2. Outline• Motivation• Objectives• Contributions• Results2• SciFrame• Database Descriptors• Organographs{
  3. 3. Motivation
  4. 4. 4Study the relationHeterogeneity ↔ Organization ↔ Sharing
  5. 5. 5NDVI Profile GenerationPostGISFilesystemPostgresWebMAPS
  6. 6. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
  7. 7. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
  8. 8. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
  9. 9. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
  10. 10. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTTPFTPWebMAPS
  11. 11. 5NDVI Profile GenerationGeometries (IBGE)Spectral Images(NASA)Crops(Min.Agr)PostGISFilesystemPostgresHTML, Microformats, 2D PlotsHTTPFTPHTTPWebMAPS
  12. 12. Objectives
  13. 13. 8
  14. 14. • describe and compare eScience systems• match Applications needs with DBMS capabilities• manage digital content hierarchies8
  15. 15. MotivationObjectives• Contributions• Results9• SciFrame• Database Descriptors• Organographs{
  16. 16. SciFrame
  17. 17. 11SciFrameThe Scientific Digital Data Processing Framework is aconceptual framework that describes systems orprocesses involving digital data manipulation.
  18. 18. InterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
  19. 19. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
  20. 20. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
  21. 21. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data Management
  22. 22. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementData ManagementManipulationCreateRetrieveUpdateDeleteIndexStorage
  23. 23. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementData ManagementManipulationCreateRetrieveUpdateDeleteIndexStorageInformation Management
  24. 24. SciFrameInterfacingAcquisitionPublication(discovery - extraction - transference )Information Management Data ManagementInformation Management
  25. 25. SciFrameInterfacingAcquisitionDiscoveryExtractionTransferencePublicationData ManagementStorageManipulationInformation ManagementDescriptionTransformationFusingFiltering
  26. 26. WebMapsInterfacingAcquisitionDiscovery Geometries (IBGE), Raster(NASA), Crops(Min.Agr)Extraction ad hoc extractor scripts (paparazzi)Transference FTP and HTTPPublication HTML, Microformats, 2D PlotsData ManagementStorage Geometries(PostGIS), Raster(Files), Crops(Postgres)Manipulation Geometries(CRDI), Raster(CRD), Crops(CRUDI)Information ManagementDescription Geometries(SHP,WKT), Raster(HDF,GeoTIFF)TransformationFusing NDVI Time SeriesFiltering Cloud and noise removal (HANTS)
  27. 27. Research ProblemsInterfacingAcquisitionDiscovery data scattered, many providers, search engines ?Extraction feasibility, preserve provenance, lack of semanticsTransference availability, voluminous data, bandwidth, protocolPublication lack of intention, access control, traceabilityData ManagementStorage scalability, distribution, consistency, preservationManipulation multimedia, impedance mismatchInformation ManagementDescription implicit x explicit, semantic web, social, trust, privacyTransformationinformation lost: conceptual > logical > physicalmulti-modalityhandle uncertain and incomplete data
  28. 28. TechnologiesInterfacingAcquisitionDiscovery DAS Registry, BIOCatalogue, SciScopeExtraction Scrappers,Wrappers, PiggyBank, OperatorTransference Streaming, P2P, OpenDAPPublication SOA x ROA, Microformats x RDFaData ManagementStorage Scientific Datasets, XML, Cloud ComputingManipulation SQL extensions, ORMs, LINQInformation ManagementDescription In Loco SemanticsTransformationArray Algebra (RASDAMAN)Topological Operators (GIS)Proximity Search and Report Language (ISIS)
  29. 29. InterfacingAcquisitionPublication(discovery - extraction - transference )Information ManagementData Management
  30. 30. Data Management
  31. 31. Data Management
  32. 32. Data Management✓enforce loose coupling between Apps and DBMS✓DBMS product/vendor independence✓seamless cross-database migration✓capability verification, validation and negotiation✓support Apps and DBMS in the cloud!
  33. 33. DatabaseDescriptors
  34. 34. DBMSDescriptorsFeature descriptorDesiderata descriptorspecifies what a client application needs12App
  35. 35. DBMSDescriptorsFeature descriptorDesiderata descriptorspecifies what a client application needsspecifies what a DBMS provides12App
  36. 36. Architecture15WebDMS XDMS YDMS Z
  37. 37. Architecture15WebDMS XDMS YDMS ZDescriptorRegistrydescriptor XdescriptorY
  38. 38. Architecture15WebDMS XDMS YDMS ZDescriptorRegistryDescriptorRegistryDescriptorRegistryDescriptorRegistrydescriptor XdescriptorY
  39. 39. Architecture15WebDMS XDMS YDMS ZDescriptorRegistryDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescriptor XdescriptorY
  40. 40. Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescriptor XdescriptorY
  41. 41. Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescriptor XdescriptorY
  42. 42. Architecture15WebDMS XDMS YDMS ZDescriptorRegistryNegotiatorDescriptorRegistryDescriptorRegistryDescriptorRegistryAppdescriptor XdescriptorYbinding
  43. 43. DBD Structure13 * http://dublincore.org/documents/dces/AppDBMS
  44. 44. @prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .:Cmbm a foaf:Person ;foaf:name “Claudia Bauzer Medeiros” .:DBD1 dc:identifier “DBD1” ;dc:type “Feature DBD” ;dc:format “text/turtle” ;dc:title “Sample Feature Descriptor” ;dc:description “Hypothetical Feature DBD in RDF/Turtle” ;dc:creator :Cmbm ;dc:date “2009-12-18” ;dc:language “EN” ;:isolation :READ_COMMITED ;:versioning “unsupported” ;:storage “RDF Triples” ;:DML [ a rdf:Bag ;rdf:_1 RDQL ;rdf:_2 SPARQL ;] .Feature Descriptor
  45. 45. @prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .:Rodsenra a foaf:Person ;foaf:name “Rodrigo Dias Arruda Senra” .:DBD2 dc:identifier “DBD2” ;dc:type “Desiderata DBD” ;dc:format “text/turtle” ;dc:title “Sample Desiderata Descriptor” ;dc:description “Desiderata DBD for hypothetical App” ;dc:creator :Rodsenra;dc:date “2010-01-05” ;dc:language “EN” ;:isolation :READ_COMMITED ;:concurrency “Two phase lock” ;:storage “RDF Triples” ;:DML SPARQL .Desiderata Descriptor
  46. 46. Understanding Hierarchies...SciFrame DBDs
  47. 47. Organographs
  48. 48. 27
  49. 49. 28Which of the following sets betteraccommodate the object above ?
  50. 50. 29Red ? Triangles ? Metric Related ?
  51. 51. Problems301. Single Category versus Multi-faceted Content2. Manually-defined categories3.Criteria is not explicit4.Static Membership Relation5. Organization is not reusable
  52. 52. 31
  53. 53. 31Organograph... artifact to make explicit how to organizeinformation in the context of a particular task.
  54. 54. Organograph32Hout = forg(Hin)vcnteaggecntH(V,E)vaggvagg
  55. 55. Organograph32Hout = forg(Hin)forg:• navigation (crawler/iterador)• feature extraction• FHil(vagg,vagg): hierarchical structuring• FCat(vagg,vcnt): categorizationURLHoutHinURLvcnteaggecntH(V,E)vaggvagg
  56. 56. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !
  57. 57. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• patterns• dictionaries• rules• probabilities• templates/wrappers
  58. 58. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• matching• dice• jaccard• overlap• cosine
  59. 59. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• FOAF• Dbpedia• Schema.org• Freebase• MusicBrainz• Geonames
  60. 60. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• Naive Bayes• SVM• Nearest Neighbors• LDA• LSI
  61. 61. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• Filesystem• Gmail• Evernote• Delicious• DropboxDBDs!
  62. 62. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies33IteratorsDataContainer UXOrganograph CompositionTask !• Fuse, Dokan• Infoviz• D3
  63. 63. Metodology34collection
  64. 64. Metodology34collectionorganize
  65. 65. Metodology34collectionorganizeevaluate
  66. 66. Metodology34collectionorganizeevaluatereorganize
  67. 67. Metodology34collectionorganizeevaluatereorganizeshare
  68. 68. Evaluating Hierarchies35
  69. 69. Evaluating Hierarchies35too much content
  70. 70. Evaluating Hierarchies35too much contentduplicated or misplaced
  71. 71. Evaluating Hierarchies35too much contenttoo manyaggregatorsduplicated or misplaced
  72. 72. Evaluating Hierarchies35too much contenttoo manyaggregatorsduplicated or misplacedtoo deep
  73. 73. Reorganizing Hierarchies36AliceBob201120082011AuthorPublication Datepaper 1paper 2paper 3
  74. 74. Reorganizing Hierarchies36AliceBob201120082011AuthorPublication Date AuthorPublication Datepaper 1paper 2paper 3
  75. 75. Reorganizing Hierarchies36AliceBob201120082011 AliceBob20082011AliceAuthorPublication Date AuthorPublication DateTask is important!paper 1paper 2paper 3
  76. 76. Reuse Organization37
  77. 77. Reuse Organization37
  78. 78. Reuse Organization37Hacm Vcntmine
  79. 79. HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
  80. 80. HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
  81. 81. HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
  82. 82. HinHoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowOrganograph ExecutionFCat()FHil()Visualization
  83. 83. @organographdef forg_ccs98(self, input):self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’self.description = ‘docs by ACM CCS98’ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’)trainset = []for category,words in nlp_clean_titles(ccs98.Vcnt.paths):for w in words:trainset.append((make_feature(w), category))classifier = NaiveBayes(trainset)self.Ecnt = classifier.classify(input) # FCatself.Eagg = ccs98.Eagg.Level[:1] # FHil
  84. 84. @organographdef forg_ccs98(self, input):self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’self.description = ‘docs by ACM CCS98’ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’)trainset = []for category,words in nlp_clean_titles(ccs98.Vcnt.paths):for w in words:trainset.append((make_feature(w), category))classifier = NaiveBayes(trainset)self.Ecnt = classifier.classify(input) # FCatself.Eagg = ccs98.Eagg.Level[:1] # FHilinput = collection(‘file:///some/local/dir/docs’)output = forg_ccs98(input)publish(output, ‘rodsenra@dropbox:/output’)organicer.render(output, organicer.views.HYPERBOLIC_TREE)
  85. 85. forg_ccs_98InterfacingAcquisitionDiscovery ACM CCS98, HinExtraction pdf2txt,pdfbox, pypdf; NLTK (tokenizer)Transference HTTP, WebDAV, NFS, SMBPublication Hout :HTML+CSS, JS(Infoviz,D3); DropboxData ManagementStorage NoSQL DB (Mongo, Neo4J)Manipulation Indexes (CRDI)Information ManagementDescription SKOS, GraphML, JSONTransformationMining NaiveBayesFiltering Vcnt(unconverted pdfs); Vagg (empty or ambiguous)
  86. 86. Related Work
  87. 87. Related Work (SciFrame)• CLRC scientific metadata modelB. Matthews and S. SufiThe CLRC Scientific Metadata Model, version 1, DL TR 02001, CLRC2001• myGrid Information ModelSharman, Nick, et al."The myGrid information model." UK e-Science programme All Hands Conference.2004.
  88. 88. Related Work (DBDs)Madnick and Wang.EvolutionTowards Strategic Applications Of DatabasesThroughComposite Information Systems.Journal of Management Information Systems 5(2):5-22 1988“In order to: separate data from the application processing, it is necessary to employ aprocess descriptor and a database descriptor.The process descriptor describes the name, the input/output data requirement, and otherresource requirements of the processing components.The database descriptor contains information about the data (e.g., data model, schema,access rights) in the database, similar to data dictionaries.These two descriptors can be used by the execution environment to coordinate theinteraction between the processing component and the database.”
  89. 89. Related Work (Organographs)•Topic ModelingLSA, LDA, Hierarchical BayesianBlei 201; Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999;2001• Personal Information ManagementCALO, UMEA, X-COSIM, Haystack, UpLib, IrisZimmermann 2005; Arndt 2007; Lansdale 1988; Kaptelinin 2003; Janssen & Popat 2003;Karger et al 2003• Semantic DesktopNepomuk, SEMSOCGiannakidou et al 2008; Groza et al 2007• Personal Digital LibrariesZotero, Mendeley, Papers
  90. 90. Results
  91. 91. Contributions• SciFrame• Database Descriptors (DBDs)• Organographs• Software tools & algorithms:WebMAPS, Paparazzi & Organicer46
  92. 92. Publicationssubmitted toJODSEvaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros.Journal on Data Semantics (submetido em 2012-10-25)2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-5882010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of Engineering of Computer-Based Systems (ECBS): 386-3922009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of the III Brazilian eScience workshop (XXIV SBBD)2009A standards-based framework to foster geospatial data and process interoperability.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros.Journal of the Brazilian Computer Society 15(1): 13-252008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL2007O projeto WebMAPS: desafios e resultados.Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra.Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-25047
  93. 93. Publicationssubmitted toJODSEvaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros.Journal on Data Semantics (submetido em 2012-10-25)2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-5882010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of Engineering of Computer-Based Systems (ECBS): 386-3922009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of the III Brazilian eScience workshop (XXIV SBBD)2009A standards-based framework to foster geospatial data and process interoperability.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros.Journal of the Brazilian Computer Society 15(1): 13-252008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros.Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL2007O projeto WebMAPS: desafios e resultados.Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra.Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-25047SciFrameWebMapsDBDsOrganographs
  94. 94. ExtensionsTheoretical PracticalSciFrame • formalize design pattern• enhance the operations vocabulary• online catalog of eScience systems• describe as ontology (RDF)DatabaseDescriptors• analyse negotiation frameworks• expand DBDs expressivity• explore ranking algorithms• catalog of concrete DBDs• adapt Organicer to use DBDs• experiment with dynamic negotiationOrganographs • model with CategoryTheory• explore DSLs to describe forg• support non-textual media (eg.:img)• expand component palette48
  95. 95. Agradecimentos• Laboratório de Sistemas de Informação (IC-Unicamp)http://www.lis.ic.unicamp.br• Brazilian Institute for Web Science Researchhttp://webscience.org.br• Fapesp - CNPQ - CAPES49
  96. 96. Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.brrsenra@acm.org
  97. 97. Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.brrsenra@acm.orgThank you.Agradeço sua atenção.
  98. 98. Support Material
  99. 99. Hierarquiade Origem
  100. 100. Hierarquiade OrigemPre-processamentoBeautifulSouppyPdf
  101. 101. Hierarquiade OrigemExtraçãoNLTKPre-processamentoBeautifulSouppyPdf
  102. 102. Hierarquiade OrigemExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongo
  103. 103. Hierarquiade OrigemWorkflow de TransformaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongo
  104. 104. Hierarquiade OrigemWorkflow de TransformaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx gensimnumpy scikit-learn
  105. 105. Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx gensimnumpy scikit-learn
  106. 106. Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx gensimnumpy scikit-learnmatplotlibObsPyInfoViz.jsD3.js
  107. 107. Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoNavegação daHierarquiaIteradorExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx gensimnumpy scikit-learnmatplotlibObsPyInfoViz.jsD3.js
  108. 108. Hierarquiade OrigemWorkflow de TransformaçãoHierarquiaResultanteVisualizaçãoNavegação daHierarquiaIteradorExtraçãoNLTKPre-processamentoBeautifulSouppyPdfÍndice deFacetaspymongonetworkx gensimnumpy scikit-learnmatplotlibObsPyInfoViz.jsD3.jsos.walkpydeliciousevernote
  109. 109. Hin HoutInternalIndexesPre-processingFeatureExtractionTransformation WorkflowFCat()FHil()Visualization
  110. 110. NLPAuthorMLContentDomainExpert RolesOntologiesClassifiersInformationExtractionAlgorithmsSimilarityforgVizualizationStrategies54IteratorsDataContainer UXTask !
  111. 111. 55forg:• navigation (crawler/iterador)• feature extraction• FHil(vagg,vagg): hierarchical structuring• FCat(vagg,vcnt): categorizationHin:URLHout:URL
  112. 112. <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"><rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --><dc:creator>Claudia Bauzer Medeiros</dc:creator><dc:description>Hypothetical DBD for an RDF DBMS</dc:description><dc:identifier>DBD1</dc:identifier><dc:format>application/rdf+xml</dc:format><dc:type><rdf:Description> <dbd:Type>Feature DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Descriptor of an RDF DBMS</dc:title><dc:date>2009-12-18</dc:date><dc:language>EN</dc:language><!-- dimensions and values --><dbd:concurrency>Two phase lock</dbd:concurrency><dbd:versioning>unsupported</dbd:versioning><dbd:storage>RDF triples</dbd:storage><dbd:DML> <rdf:Bag><rdf:li>RDQL</rdf:li><rdf:li>SPARQL</rdf:li> </rdf:Bag></dbd:DML></rdf:Description></rdf:RDF>
  113. 113. <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"><rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --><dc:creator>Rodrigo Dias Arruda Senra</dc:creator><dc:description>Desiderata DBD for an hypothetical application</dc:description><dc:identifier>DBD2</dc:identifier><dc:format>application/rdf+xml</dc:format><dc:type><rdf:Description> <dbd:Type>Desiderata DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Desiderata descriptor of an hypothetical application</dc:title><dc:date>2010-01-05</dc:date> <dc:language>EN</dc:language><!-- dimensions and values --><dbd:concurrency>Two phase lock</dbd:concurrency><dbd:storage>RDF triple store</dbd:storage><dbd:DML>RDQL</dbd:DML></rdf:Description></rdf:RDF>
  114. 114. 58NDVI Profiles
  115. 115. Data ManagementManipulationCreateRetrieveUpdateDeleteIndexStorage
  116. 116. Information ManagementTransformations‣Browsing‣Iterating‣Searching‣ Augmenting‣Mining‣Description‣Annotation‣ Schematization ‣Summarizing‣Structuring‣Sorting‣Merging‣ Decreasing‣ Filtering‣ Fusing
  117. 117. Example61
  118. 118. Example62Input CollectionTask: info extractionTask: transformationTask: visualization
  119. 119. 63WebMAPS: DataFlowCorreioFTPMODIS Reprojection ToolImagensRecorteda regiãoGeometria(IBGE)‫‏‬
  120. 120. 64NDVI
  121. 121. Related Work9• embedded• n-tier client/server (including web services)• mediatorsApproaches to App-to-DMS bindingInformation Integration [1]Process• Understanding• Standardization• Specification• Execution [1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura HaasMechanism• Materialization• Federation• Indexing
  122. 122. Related Work9• embedded• n-tier client/server (including web services)• mediatorsDescriptors are orthogonal to all of these!Approaches to App-to-DMS bindingInformation Integration [1]Process• Understanding• Standardization• Specification• Execution [1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura HaasMechanism• Materialization• Federation• Indexing
  123. 123. 66Extração dos Dados Sensoriasdataset = gdal.Open(raster_file, GA_ReadOnly )‫‏‬# Obtenção dos coeficientes para funções afins de mapeamento de coordenadasgt = dataset.GetGeoTransform()‫‏‬# Obtenção da banda de dados de interesseband = dataset.GetRasterBand(1)‫‏‬# Identificação do padrão de codificação dos dados.# No caso do arquivo TIF os dados são bytes sem sinal (Byte)‫‏‬data_type = gdal.GetDataTypeName(band.DataType)# Obtenção das dimensões da imagemwidth, height = band.XSize, band.YSize# Conversão do MBR do sistema de coordenadas lat/long para linha/coluna# Xgeo = GT(0) + Xpixel*GT(1) + Yline*GT(2)‫‏‬# Ygeo = GT(3) + Xpixel*GT(4) + Yline*GT(5)ul_pixel, lr_pixel = g2p(gt,*ul_geo), g2p(gt,*lr_geo)‫‏‬
  124. 124. 67WebMAPS
  125. 125. Case Study:WebMaps
  126. 126. Case Study:WebMaps
  127. 127. 69Extração dos Dadosdef raster2array(ul_pixel, lr_pixel, dtype=B):"""Using ul_pixel and lr_pixel it generates a numpy arraywith the extracted interest region from the raster file"""col_size = lr_pixel[1]-ul_pixel[1]+1row_size = lr_pixel[0]-ul_pixel[0]+1scanline = band.ReadRaster(ul_pixel[1], ul_pixel[0],col_size, row_size)‫‏‬num_pixels = col_size*row_sizeroi = numpy.array(struct.unpack(dtype*num_pixels, scanline))‫‏‬roi.shape = (row_size, col_size)‫‏‬return roi# Read data from raster file into a numpy array# defining a region of interest matrixroi = raster2array(ul_pixel, lr_pixel)‫‏‬
  128. 128. 70Extração da Geometriashp = ogr.Open(filepath)‫‏‬# Layer correspondente ao Estado de São paulolayer = vf.shp.GetLayerByName(35mu500gc)# Feature correspondente ao município de Campinasfeature = layer.GetFeature(501)# Extração dos pontos de controle do perímetrogeometry = feature.GetGeometryRef()‫‏‬poly = geometry.GetGeometryRef(0)‫‏‬centroid = geometry.Centroid()‫‏‬centroid_geo = centroid.GetX(), centroid.GetY()‫‏‬# Definição do Retângulo Envoltório Mínimo (MBR)‫‏‬lg_left, lg_right, lt_bot, lt_up = poly.GetEnvelope()‫‏‬ul_geo, lr_geo = (lg_left, lt_up), (lg_right, lt_bot)‫‏‬
  129. 129. 71Operações Espaciais
  130. 130. Organicer72
  131. 131. Organicer72
  132. 132. Organicer72
  133. 133. Organicer72
  134. 134. Organicer72

×