WebLab, open source media mining platform, OW2con'12, Paris

898 views

Published on

The Web is large and information is present in many forms. Complex techniques are necessary to discover the hidden structure of content and a single software provider cannot be expert on all them. Thus the integration platform comes as a perfect solution enabling the use of the best tools for each function. In this presentation we will present OSINT challenges and its growing importance. Then we will detail the WebLab approach to build flexible and scalable OSINT applications matching the fast-paced nature of OSINT. From semantic data models to upper architecture passing through selected technologies used, the presentation will do the complete tour of the WebLab project.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
898
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

WebLab, open source media mining platform, OW2con'12, Paris

  1. 1. Open source media mining platformGérard DupontResearch engineer – COEDS2 – Advanced studies OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  2. 2. Media mining platformFrom unstructured data from any sources...… to structured and actionable knowledge OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  3. 3. OW2Con12, November 28-29, 20122005 - http://www.opte.org Orange Labs, Paris. www.ow2.org.
  4. 4. OSINT challengesSome activities need to be automated: Search/Sources assessment Data Acquisition Classification, Screening, Indexing Information retrieval Knowledge capitalisation Visualization SummarySome activities cannot be automated : Alert - experts analysis of content ; - linking and mapping heterogenous information ; - evaluating reliability and assessing information ; - report and synthesis of information. → Tools can provide support but keep human in the loop. OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  5. 5. A processing workflow Vidéo Audio Audio vocal Collect Traduction Segmentation Epuration Audio audio Traduction extraction vidéo TranscriptionTexte annoté Texte traduit audio Transcription Audio Enriched text Translated text audio Transcription Texte Sphinx An international Greenpeace An international Greenpeace alpine team delivers alpine team delivers Text messages of support and messages of support and 国際グリーンピース高山チームは富 hope for the victims of the hope for the victims of the 士山の頂上への支援と福島第一 nuclear disaster at nuclear disaster at に原子力災害の被害者のための希望 Fukushima Daiichi to the Fukushima Daiichi to the Extraction のメッセージを配信します。 summit of Mt Fuji. Collected information Extraction Traduction summit of Mt Fuji. Collected Translation Alert from thousands of d’information Traduction 日本と世界中の何千人もの人々から people in d Extraction ’information from thousands of people in Japan and all over the Japan and all over the 収集した、グリーンピースは、 world, Greenpeace hopes world, Greenpeace hopes これらのメッセージは、原子力発電 that these messages will that these messages will に反対する日本の人々を団結に役立 help unite the people of help unite the people of つ、 Japan in opposition to Japan in opposition to 日本当局はそれらに耳を傾けること nuclear power. nuclear power. を奨励することを期待しています。 OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  6. 6. Integration approachA platform providing "plug & play" functionalities for the integration of tools for collection,processing, analysis and communication... OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  7. 7. Technology pile Java Application ServerApache Tomcat Enterprise Service Bus Portail/Portlets SOA ESB JBI JSR168 Content store BPEL OWL WSDL SPARQL RDFS Database SOAP XSD XPath XQuery RDF XML Namespaces Portal URI UTF-8 Maps server OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  8. 8. Standard model & interfaces OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  9. 9. Standard model & interfaces OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  10. 10. Application per domain OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  11. 11. Application per domain OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  12. 12. Application per domain OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  13. 13. WebLab: a mature project Sample application – Local data collection – Simple information extraction – Text index/search 24 components including – 8 technical services – 9 services – 7 portlets Core plateform including – Data model – ESB – Portal – Ochestrator OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  14. 14. WebLab: a mature project Technical stuff : – code (SVN) ; – bug tracking (JIRA) ; – daily build (BAMBOO) ; – code quality (SONAR) ; – mailing list (8 guys). Available tools : – Maven plugin ; – Eclipse wizard ; – SOAPui test librarie; – CLI test tools ; – Complete Bundle ; – ... OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.
  15. 15. Thanks for your attention Take away [weblab.ow2.org]Logos and names of the tools presented are the property of their respective providers and are here only asillustration purposes on the already integrated technology in WebLab. Neither CASSIDIAN, nor EADS,claims any paternity on these external tools.HERITRIX - http://crawler.archive.org GOOGLE TRANSLATE - http://translate.google.com/FFMPEG - http://ffmpeg.org/ GATE - http://gate.ac.uk/SPHINX - http://cmusphinx.sourceforge.net/sphinx4/ JENA - http://jena.apache.org/ OW2Con12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

×