Big Data Curation And Its Application

626 views
477 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
626
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Curation And Its Application

  1. 1. Copyright © 2013 Hanmin JungHanmin JungHead of Dept. of Computer Intelligence ResearchKISTIBig Data Curation & Its Application
  2. 2. Copyright © 2013 Hanmin JungLet Me Introduce Myself :-)2
  3. 3. Copyright © 2013 Hanmin JungRecent Social Activities (2012~)보건복지부 빅데이터 기술 전문가 패널한국연구재단 융합연구우수성평가위원회 위원국립전파연구원 방송통신표준 전문위원기술표준원 빅데이터 표준화 프레임워크 표준연구회 위원ITRC 과제기획위원회 빅데이터 분과, UI/UX 분과 기획위원/기술간사교육과학기술부 과학기술 빅데이터 포럼 전문가위원회 간사국가경쟁력강화위원회 ‘범부처 Giga KOREA 사업’ 전문가 위원회 위원국가과학기술위원회 기술영향평가위원회 위원방송통신위원회 빅데이터 포럼 인력양성분과 위원장/자문위원/운영위원IBS 장비심의위원회, 지식경제부 중앙장비심의위원회 심의위원한국과학창의재단 과학창의앰배서더지식경제부 ‘SW+인문 포럼’ 멘토IT VC 전문성 강화 프로그램 전문강사ISO/IEC JTC1/SC32 전문위원회 위원, JTC 1/SC34 표준개발위원회 위원ISO 20022 TC68 Advisory Expert한국정보통신기술협회 정보통신표준화위원회 –디지털콘텐츠/SW품질평가/메타데이타/사물지능통신 프로젝트 그룹 위원교육과학기술부 차세대정보컴퓨팅분과 – 기획위원국무총리실 정보지식인대회 출제/평가위원문화체육관광부 공공문화정보 전문가 위원Let Me Introduce Myself :-)3
  4. 4. Copyright © 2013 Hanmin JungWeekly IT Trends (NIPA)과학기술빅데이터동향및활용방안(Vol. 1593)모바일비즈니스인텔리전스기술동향(Vol. 1573)시맨틱소셜네트워크를구성하는온톨로지어휘기술현황(Vol. 1560)정부빅데이터분석기술적용동향(Vol. 1556)제품및기술수명주기활용동향(Vol. 1550)개체식별관점에서바라본링크드데이터동향(Vol. 1524)유망기술발굴모델및서비스동향(Vol. 1521)포지셔닝의측면에서살펴본모바일태블릿컴퓨터동향(Vol. 1486)테크놀로지인텔리전스동향(Vol. 1477)추론기술연구동향(Vol. 1446)트리플레파지토리벤치마킹(Vol. 1439)차세대IT 기기와HCI 기술동향전망(Vol. 1435)시맨틱검색기술동향(Vol. 1431)시맨틱웹국내특허동향(Vol. 1420)감성분석과브랜드모니터링기술동향(Vol. 1396)시맨틱웹이경제⋅사회에미치는영향(Vol. 1372)웹매핑서비스비교분석(Vol. 1352)시맨틱웹2.0 기술동향(Vol. 1344)국내포털검색시장및특허동향(Vol. 1341)Open API 기술동향(Vol. 1296)엔터프라이즈검색기술동향(Vol. 1276)전자상거래검색기술동향(Vol. 1273)시맨틱웹포털기술동향(Vol. 1264)Trend Reports4
  5. 5. Copyright © 2013 Hanmin JungArt Curator5http://www.chrysler.org/files/resources/gallery-talk-3.jpg
  6. 6. Copyright © 2013 Hanmin Jung6Press
  7. 7. Copyright © 2013 Hanmin JungDefinitionActive management of data over its lifecycleEnsuring that data is trustworthy, discoverable, accessible, reusable, andfit for useE.g. data preparation for analyticsCuration TypesManual curation(Semi-)automatic curationE.g. data cleansing, record duplication, classificationSheer curation (curation at source)Integrated in workflow for creating and managing dataData Curation7E. Curry, A. Freitas, and S. O’Riain, “Data Curation at the New York Times”, 2011.
  8. 8. Copyright © 2013 Hanmin JungSteps to Setup the ProcessIdentify what data you need to curateIdentity who will curate the dataDefine the curation workflowIdentify appropriate data-in & data-out formatsIdentity the artifacts, tools, and processes needed to support the processData Curation8E. Curry, A. Freitas, and S. O’Riain, “Data Curation at the New York Times”, 2011.
  9. 9. Copyright © 2013 Hanmin JungCase Studies91. Apple Maps2. Strategic Foresight3. Big Data
  10. 10. Copyright © 2013 Hanmin Junghttp://cdn0.sbnation.com/entry_photo_images/5667445/20120920-DSC_7192VERGE_large_verge_medium_landscape.jpgApple Maps10
  11. 11. Copyright © 2013 Hanmin JungGoogle Maps vs. Apple Maps11http://www.gadgetreview.com/2013/01/google-maps-vs-apple-maps-ios-comparison.html
  12. 12. Copyright © 2013 Hanmin JungApple Sirihttp://www.apple.com/ios/siri/12
  13. 13. Copyright © 2013 Hanmin Jung13We can see it as much about itas we knowhttp://investor.google.com/financial/tables.htmlGoogle’s Financial Table
  14. 14. Copyright © 2013 Hanmin Jung14Nokia’s Burning Ships Strategyhttp://www.brightsideofnews.com/Data/2011_5_5/Steven-Elop-Burn-Boats-Strategy-is-a-Cortez-Move-for-Nokia/Hernando_Cortez_BurningBoat.jpg
  15. 15. Copyright © 2013 Hanmin JungResult of the StrategyWorldwide Market ShareWorldwide mobile device sales to end users in 2008 ~ 2013Gartner, IDC Worldwide Mobile Phone Tracker38.6, 117.937.8, 108.531.6,110.427.1, 106.618.7, 82.914.8, 61.9Nokia3.5, 12.14.9, 19.13.1, 13.73.2, 13.5ZTE418.641.9, 175.43.7, 15.48.9, 37.427.5, 115.01Q2013(%, M. Units)Company3Q2012(%, M. Units)3Q2011(%, M. Units)3Q2010(%, M. Units)3Q2009(%, M. Units)3Q2008(%, M. Units)Samsung 23.7, 105.4 22.3, 87.8 20.5, 71.4 21.0, 60.2 17.0, 52.0Apple 6.1, 26.9 4.3, 17.1 4.0, 14.1LG 3.1, 14.0 5.4, 21.1 8.1, 28.4 11.0, 31.6 7.5, 23.0Sony Ericsson 4.9, 14.1 8.4, 25.7Motorola 4.7, 13.6 8.3, 25.4Others 45.3, 201.6 36.1, 142 32.2, 112.5 20.6, 59.1 20.1, 61.5Total 444.5 393.7 348.9 287.1 305.415
  16. 16. Copyright © 2013 Hanmin Jung16Sign of the Resulthttp://www.google.com/insights/search/Google Insights for Search
  17. 17. Copyright © 2013 Hanmin JungStrategic ForesightR. Rohrbeck, H. Arnold, and J. Heuer, “Strategic Foresight in Multimedia Enterprises”, 2007.17
  18. 18. Copyright © 2013 Hanmin Jung18Hype CycleEmerging Technologies Hype Cycle 2012
  19. 19. Copyright © 2013 Hanmin Jung19Social Datahttp://bynoy.files.wordpress.com/2011/08/united-noy-weblife-60-seconds.jpg
  20. 20. Copyright © 2013 Hanmin Jung20Machine DataT. Baer, “What is Big Data? The Reality for Analytics”, OVUM, 2011.Call data recordsCall data recordsSensory dataSensory dataWeb log filesWeb log filesFinancial Instrument TradeFinancial Instrument Trade
  21. 21. Copyright © 2013 Hanmin Jung21Crop Yield Estimationhttp://www.geovar.com/data/satellite/rapideye/rapideye_sample_image_bands543.jpg
  22. 22. Copyright © 2013 Hanmin Jung22Crop Yield Estimationhttps://www.agriskmanagementforum.org/content/improved-supply-intelligence-global-agriculture-marketsSouth American soybean production forecastsSouth American soybean production forecasts
  23. 23. Copyright © 2013 Hanmin Jung23Soybeans Futures (CME)
  24. 24. Copyright © 2013 Hanmin Jung24Data Mining MethodsA. Cheung, “Forecast Anything! The Seven Data Mining Models”Decision TreesNaive BayesCluster AnalysisSequence ClusteringAssociation RulesTime SeriesNeural Networks
  25. 25. Copyright © 2013 Hanmin Jung25Value PyramidSearchClusteringExtractingDecisionSupportForecastingScenarioPlanningAdvisingModified from D. Bousfield & P. Fooladi, “STM Information: 2009 Final Market Size and Share Report”, 2010.InSciTe Advanced (2011)InSciTe Adaptive (2012)InSciTe Advisory (2013~)1st Target of Data Curation
  26. 26. Copyright © 2013 Hanmin JungData Curationfor Getting Insights & Foresights261. Crawling Data2. Extracting Information (Text Mining)3. Resolving Identities
  27. 27. Copyright © 2013 Hanmin JungCrawling DataCrawling Full Texts from Google Scholar27
  28. 28. Copyright © 2013 Hanmin JungCrawling DataCrawling Web Data by RSS & Google API28
  29. 29. Copyright © 2013 Hanmin JungCrawling DataWeb Data Statistics in InSciTe Adaptive29Source/Year 2001 … 2008 2009 2010 2011 2012 TotalIDC 0 … 1 13 21 313 250 598Wikipedia 0 … 151,942 181,721 408,017 1,555,867 249,0209 4,975,178InformationWeek 0 … 470 551 536 388 81 2,316Gizmag 0 … 1,371 2,301 2,970 2,850 2,520 16,731Technologyreview 89 … 646 693 758 873 405 5,330IEEE spectrum 64 … 468 426 385 306 205 3,173Technewsworld 31 … 696 742 1,396 2,332 2,485 9,843DiscoverMagazine 2 … 104 51 79 57 49 606NewYork Times 20,940 … 7,979 4,696 3,971 3,669 2,064 124,100BBC 3,155 … 2,687 3,158 3,318 5,964 4,243 37,073Fox News 0 … 1,577 1,965 1,123 2,015 1,872 10,749CNN 3,594 … 1,154 1,499 1,071 1,308 814 19,769Thomson Reuters 0 … 450 371 309 338 222 3,408USA Today 458 … 4,616 3,878 4,630 6,579 4,276 38,750EtnTws.com 447 … 1,964 2,241 2,099 1,585 844 14,259Total 28,780 … 176,125 204,306 430,683 1,584,444 2,510,539 5,261,883
  30. 30. Copyright © 2013 Hanmin JungText Mining – Concept30
  31. 31. Copyright © 2013 Hanmin Jung31Mining TargetsNamed entities: e.g. PLOsPattern-based entities: e.g. a-mail addresses, phone numbersConcepts: abstractions of entitiesFacts and relationshipsConcrete and abstract attributes: e.g. 10-year, expensive, comfortableSubjectivity in the forms of opinions, sentiments, and emotions: e.g.positive/negative, angryS. Grimes, “Text Analytics Overview: Technology, Solutions, Market”, 2011.Text Mining – Concept
  32. 32. Copyright © 2013 Hanmin Jung32Text Mining – SINDI Architecture
  33. 33. Copyright © 2013 Hanmin JungText Mining – Example33
  34. 34. Copyright © 2013 Hanmin JungText Mining – Example34
  35. 35. Copyright © 2013 Hanmin JungText Mining – Example35
  36. 36. Copyright © 2013 Hanmin JungText Mining – Example36
  37. 37. Copyright © 2013 Hanmin JungText Mining – Example37
  38. 38. Copyright © 2013 Hanmin JungText Mining – Application38
  39. 39. Copyright © 2013 Hanmin JungResolving IdentitiesChristian Bizer, Tom Heath, and Tim Berners-Lee, “Linked Data – The Story So Far”, 2009.URI AliasesURIs that refer to the same real-world objectsE.g. http://dbpedia.org/resource/Berlin (for Berlin in DBpedia)E.g. http://sws.geonames.org/2950159 (for Berlin in Geonames)Information providers can set owl:sameAs links to URI aliases they knowaboutResolution of Data Conflicts in Data FusionChoosing a value in situations where multiple sources provide differentvalues for the same property of an object39
  40. 40. Copyright © 2013 Hanmin Jung40Resolving IdentitiesExample소녀시대, 소시, 少女時代, しょうじょじだい, 少時, Sho Jo Ji Dai, So NyeoShi Dae, SoShi, Girl’s Generation, SNSD, So Nyuh Shi Dae, …
  41. 41. Copyright © 2013 Hanmin JungResolving Identitieshttp://sindice.com/search?q=Hanmin+Jung41
  42. 42. Copyright © 2013 Hanmin Jung42OntoURIResolverDemonstration
  43. 43. Copyright © 2013 Hanmin JungLOD Projecthttp://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.htmlLinked Data31 billion RDF triples, 504 RDF links (2011.9)43
  44. 44. Copyright © 2013 Hanmin JungInSciTe Advanced (2011)44
  45. 45. Copyright © 2013 Hanmin Jung45InSciTe Adaptive (2012)https://play.google.com/store/apps/details?id=net.xenix.inscite&feature=search_result#?t=W10.
  46. 46. Copyright © 2013 Hanmin Jung46InSciTe Adaptive (2012)
  47. 47. Copyright © 2013 Hanmin JungInSciTe Adaptive (2012)Data Fact SheetArticles: 22.6 millions (9.8 millions for papers, 7.6 millions for patents, 5.3millions for Web data)All technical areas (2001~2011)Named entities: 1.9 millionsAuthority dictionary: 1.5 millions entriesLOD data: 290 GB (are being connected)47
  48. 48. Copyright © 2013 Hanmin JungPress48
  49. 49. Copyright © 2013 Hanmin JungInSciTe ArchitectureAnalytics ModelsETD ModelEmerging Technology Discovery ModelTLCD ModelTechnology Life Cycle Discovery ModelTLC ModelTechnology Life Cycle ModelOntoRelFinder®Relationship Path FinderOntoReasoner®Reasoning EngineOntoURI®Semantic Knowledge ManagerOntoPipeliner®Semantic Service ComposerSS&AESemantic Search & Analytics EngineOntoURIResolver®Identity ResolverSINDI-CORE/LINKEntity & Relationship ExtractorTUC ModelTerminology Use Cycle ModelOntologyLinked DataOntoFrameOntoVerifier®Reasoning VerifierWeb Data CrawlerRSS/Google APIWeb DataLiteratures49
  50. 50. Copyright © 2013 Hanmin JungOntologyDBOntology SchemaOntology InstancesRDF TriplesPortability & ConnectibilityFor Storing & ManagingFor ServicePlanning ServicesDefining ConceptsExploiting Relations50
  51. 51. Copyright © 2013 Hanmin JungBig Data Processing in InSciTe51
  52. 52. Copyright © 2013 Hanmin Jung52InSciTe Homepagehttp://inscite.kisti.re.kr
  53. 53. Copyright © 2013 Hanmin Jung53Thank youjhm@kisti.re.kr“A lot of times, people don’t know what they want until you show it to them.”by Steve Jobs“Many people won’t be convinced until they’ve seen it for themselves.”by Jakob Nielsen

×