Hosted by BDV PPP. BigDataStack, I-BiDaaS, Track & Know and Policy Cloud join forces in a series of online demonstrations of innovative Big Data Technologies unlocking the potential of various applications.
The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. To this end, BDV PPP projects I-BiDaaS, BigDataStack, and Track & Know, deliver innovative technologies to address the emerging needs of data operations and applications. To fully exploit the sustainability and take full advantage of the developed technologies, the projects onboarded pilots that exhibit their applicability in a wide variety of sectors. In the Big Data Pilot Demo Days the projects will showcase the developed and implemented technologies to interested end-users from the industry as well as technology providers, for further adoption.
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
Transforming the European Data Economy: A Strategic Research and Innovation Agenda
Keynote at European Data Forum 2016
Prof. Dr. Milan Petković, Vice President BDVA, Philips
Dr. Edward Curry, Vice President BDVA, Insight
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
Events are encapsulated pieces of information that flow from one event agent to another. In order to process an event, additional information that is external to the event is often needed. This is achieved using a process called event enrichment. Current approaches to event enrichment are external to event processing engines and are handled by specialized agents. Within large-scale environments with high heterogeneity among events, the enrichment process may become difficult to maintain. This paper examines event enrichment in terms of information completeness and presents a unified model for event enrichment that takes place natively within the event processing engine. The paper describes the requirements of event enrichment and highlights its challenges such as finding enrichment sources, retrieval of information items, finding complementary information and its fusion with events. It then details an instantiation of the model using Semantic Web and Linked Data technologies. Enrichment is realised by dynamically guiding a spreading activation algorithm in a Linked Data graph. Multiple spreading activation strategies have been evaluated on a set of Wikipedia events and experimentation shows the viability of the approach.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBig Data Value Association
The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. To this end, BDV PPP projects I-BiDaaS, BigDataStack, Track & Know and Policy Cloud deliver innovative technologies to address the emerging needs of data operations and applications. To fully exploit the sustainability and take full advantage of the developed technologies, the projects onboarded pilots that exhibit their applicability in a wide variety of sectors. In the Big Data Pilot Demo Days, the projects will showcase the developed and implemented technologies to interested end-users from the industry as well as technology providers, for further adoption.
Watch full webinar here: https://bit.ly/3qyTmTB
BITanium enables organizations to maximize the value of their data and improve consistency and data quality, with organization-wide data governance. Tune in to this session to learn two brief customer stories of companies that have gained significant value by working with BITanium and Denodo.
Hosted by BDV PPP. BigDataStack, I-BiDaaS, Track & Know and Policy Cloud join forces in a series of online demonstrations of innovative Big Data Technologies unlocking the potential of various applications.
The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. To this end, BDV PPP projects I-BiDaaS, BigDataStack, and Track & Know, deliver innovative technologies to address the emerging needs of data operations and applications. To fully exploit the sustainability and take full advantage of the developed technologies, the projects onboarded pilots that exhibit their applicability in a wide variety of sectors. In the Big Data Pilot Demo Days the projects will showcase the developed and implemented technologies to interested end-users from the industry as well as technology providers, for further adoption.
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
Transforming the European Data Economy: A Strategic Research and Innovation Agenda
Keynote at European Data Forum 2016
Prof. Dr. Milan Petković, Vice President BDVA, Philips
Dr. Edward Curry, Vice President BDVA, Insight
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
Events are encapsulated pieces of information that flow from one event agent to another. In order to process an event, additional information that is external to the event is often needed. This is achieved using a process called event enrichment. Current approaches to event enrichment are external to event processing engines and are handled by specialized agents. Within large-scale environments with high heterogeneity among events, the enrichment process may become difficult to maintain. This paper examines event enrichment in terms of information completeness and presents a unified model for event enrichment that takes place natively within the event processing engine. The paper describes the requirements of event enrichment and highlights its challenges such as finding enrichment sources, retrieval of information items, finding complementary information and its fusion with events. It then details an instantiation of the model using Semantic Web and Linked Data technologies. Enrichment is realised by dynamically guiding a spreading activation algorithm in a Linked Data graph. Multiple spreading activation strategies have been evaluated on a set of Wikipedia events and experimentation shows the viability of the approach.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBig Data Value Association
The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. To this end, BDV PPP projects I-BiDaaS, BigDataStack, Track & Know and Policy Cloud deliver innovative technologies to address the emerging needs of data operations and applications. To fully exploit the sustainability and take full advantage of the developed technologies, the projects onboarded pilots that exhibit their applicability in a wide variety of sectors. In the Big Data Pilot Demo Days, the projects will showcase the developed and implemented technologies to interested end-users from the industry as well as technology providers, for further adoption.
Watch full webinar here: https://bit.ly/3qyTmTB
BITanium enables organizations to maximize the value of their data and improve consistency and data quality, with organization-wide data governance. Tune in to this session to learn two brief customer stories of companies that have gained significant value by working with BITanium and Denodo.
How real is multi-cloud for enterprises? Challenges of multi-cloud architectureDenodo
Watch full webinar here: https://bit.ly/3FVU5VH
While some organizations are planning or establishing a multi-cloud strategy, for many organizations it is still a far cry, given the underlying complexity and associated risks. Listen to this panel of experts debate and talk about how real is a multi-cloud strategy for organizations and how a logical data fabric may offer some needed help.
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
An invited talk to the Galway-Mayo Institute of Technology on the current state of the art in Sustainable IT for energy management, the challenges, and the emerging trends.
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Citizen Actuation For Lightweight Energy ManagementEdward Curry
In this work, we aim to utilise the concept of citizen sensors but also introduce the theory of citizen actuation. Citizen sensors observe, report, and collect data – we propose by supporting these citizen sensors with methods to affect their surroundings we enable them to become citizen actuators. We outline a use case for citizen actuation in the Energy Management domain, propose an architecture (a Cyber-Physical Social System) built on previous work in Energy Management with Twitter integration, use of Complex Event Processing (CEP), and perform an experiment to test this theory. We motivate the need for citizen actuation in Building Management Systems due to the high cost of actuation systems. We define the concept of citizen actuation and outline an experiment that shows a reduction in average energy usage of 24%. The experiment supports the concept of citizen actuation to improve energy usage within the experimental environment and we discuss future research directions in this area.
Borqs Technologies Inc operates as a software development company. It is engaged in software, development services, and products providing customizable, differentiated, and scalable Android-based smart connected devices and cloud service solutions. The company's segments include Yuantel and Connected Solution. Borqs derives most of its revenues from its Connected Solution which includes Software and Hardware.
IDS: Update on Reference Architecture and Ecosystem DesignBoris Otto
This presentation motivates the Industrial Data Space and gives an update on the IDS Reference Architecture Model as well as the related ecosystem. It sets data in the context of business model innovation and points out how the IDS Reference Architecture relates to alternative data architecture styles such as data lakes and blockchain technology, for example. The presentation was given at the IDSA Summit on March 22, 2018.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on July 2, 2015, at 'Swiss Re' in Adliswil, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Virtual Reality Training in Smart Factory A Perspective Viewijtsrd
The paper presents a model Virtual Reality framework for learning of activity in a clever industrial facility, working as per the idea of Industry 4.0. The Smart Factory lab—illustration of some portion of savvy processing plant—is depicted. The VR model framework and cycle of its structure is additionally introduced, beginning from digitalization of the genuine brilliant plant, through rationale programming also, association of fringe VR gadgets. Elements of the preparation framework are introduced, alongside bearings of future investigations and advancement. Akash Parmar | Mohit Singh Tomar | Dr. Ritu Shivastava "Virtual Reality Training in Smart Factory: A Perspective View" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-6 , October 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47482.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/47482/virtual-reality-training-in-smart-factory-a-perspective-view/akash-parmar
International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto
This presentation given at the European Big Data Value Forum on November 13, 2018, in Vienna introduces International Data Spaces (IDS) as a reference architecture and implementation for data sovereignty. The IDS archiecture rests on usage control technologies and trusted computing environments and, thus, forms a strategic enabler for a fair data economy which respects the interests of the data owners.
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
From a talk at the Data Architecture Summit in Chicago in 2018--reviews digital transformation and what deep transformation really implies at the data layer. Cross-enterprise knowledge graphs are becoming feasible and can be a key enabler of deep transformation.
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
The complexity, volume and diversity of government policies and regulations raises significant burden on both the complying parties and government itself. On the one hand, businesses, civil organizations and other societal entities are required to simultaneously comply with and interpret different and possibly conflicting or inconsistent regulations. On the other hand, government as a whole must ensure policy and regulatory coherence across its various policy domains. While the recent wave of open government initiatives have led to significantly more public access to these documents, features allowing cross-referencing related documents and linking to less formal documents or comments on other media more understandable and accessible to the public are not common if at all available today. As a solution to this challenge, we propose an Open Government-wide Policy and Regulation Information Space consisting of documents that are “semantically” annotated and cross-linked to other documents in the information space as well as to external resources such as interpretations, comments and blogs on the social web.
Our approach is three-fold. First, we identify the requirements for the infrastructure. Second, we eloborate a Reference Architecture identifying the various elements needed within the infrastructure. Third, we show how such infrastructure may be realised as a linked data portal where policies and regulations are published as linked open data. Finally, we present a case study involving environmental policy and regulations; discuss the potential impact of such infrastructure on coherency and accessibility of policies and regulations and concludes with challenges associated with provisioning a linked open policy and regulatory information infrastructure.
Sokszor találkozol a digital thread kifejezéssel, de nem tudod hogy pontosan micsoda? Kíváncsi lenné rá, hogyan használják a cégek? Olvasd el prezentációnkat, amit előadtunk az idei Simonyi Konferencián!
International Data Spaces: Data Sovereignty and Interoperability for Business...Boris Otto
This presentation was held in a workshop session on IoT Business Models and Data Interoperability at the Max Planck Institute for Innovation and Competition in Munich on 8 October 2018. The presenation introduces the concept of business ecosystems and the role of data within the latter, then outlines the state of the art in terms of interoperability and sovereignty and finally sketches the IDS contribution.
Data Virtualization. An Introduction (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3uiXVoC
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Watch on-demand this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise? Where does it fit..?
How real is multi-cloud for enterprises? Challenges of multi-cloud architectureDenodo
Watch full webinar here: https://bit.ly/3FVU5VH
While some organizations are planning or establishing a multi-cloud strategy, for many organizations it is still a far cry, given the underlying complexity and associated risks. Listen to this panel of experts debate and talk about how real is a multi-cloud strategy for organizations and how a logical data fabric may offer some needed help.
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
An invited talk to the Galway-Mayo Institute of Technology on the current state of the art in Sustainable IT for energy management, the challenges, and the emerging trends.
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Citizen Actuation For Lightweight Energy ManagementEdward Curry
In this work, we aim to utilise the concept of citizen sensors but also introduce the theory of citizen actuation. Citizen sensors observe, report, and collect data – we propose by supporting these citizen sensors with methods to affect their surroundings we enable them to become citizen actuators. We outline a use case for citizen actuation in the Energy Management domain, propose an architecture (a Cyber-Physical Social System) built on previous work in Energy Management with Twitter integration, use of Complex Event Processing (CEP), and perform an experiment to test this theory. We motivate the need for citizen actuation in Building Management Systems due to the high cost of actuation systems. We define the concept of citizen actuation and outline an experiment that shows a reduction in average energy usage of 24%. The experiment supports the concept of citizen actuation to improve energy usage within the experimental environment and we discuss future research directions in this area.
Borqs Technologies Inc operates as a software development company. It is engaged in software, development services, and products providing customizable, differentiated, and scalable Android-based smart connected devices and cloud service solutions. The company's segments include Yuantel and Connected Solution. Borqs derives most of its revenues from its Connected Solution which includes Software and Hardware.
IDS: Update on Reference Architecture and Ecosystem DesignBoris Otto
This presentation motivates the Industrial Data Space and gives an update on the IDS Reference Architecture Model as well as the related ecosystem. It sets data in the context of business model innovation and points out how the IDS Reference Architecture relates to alternative data architecture styles such as data lakes and blockchain technology, for example. The presentation was given at the IDSA Summit on March 22, 2018.
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on July 2, 2015, at 'Swiss Re' in Adliswil, Switzerland.
ABSTRACT
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Virtual Reality Training in Smart Factory A Perspective Viewijtsrd
The paper presents a model Virtual Reality framework for learning of activity in a clever industrial facility, working as per the idea of Industry 4.0. The Smart Factory lab—illustration of some portion of savvy processing plant—is depicted. The VR model framework and cycle of its structure is additionally introduced, beginning from digitalization of the genuine brilliant plant, through rationale programming also, association of fringe VR gadgets. Elements of the preparation framework are introduced, alongside bearings of future investigations and advancement. Akash Parmar | Mohit Singh Tomar | Dr. Ritu Shivastava "Virtual Reality Training in Smart Factory: A Perspective View" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-6 , October 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47482.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/47482/virtual-reality-training-in-smart-factory-a-perspective-view/akash-parmar
International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto
This presentation given at the European Big Data Value Forum on November 13, 2018, in Vienna introduces International Data Spaces (IDS) as a reference architecture and implementation for data sovereignty. The IDS archiecture rests on usage control technologies and trusted computing environments and, thus, forms a strategic enabler for a fair data economy which respects the interests of the data owners.
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
From a talk at the Data Architecture Summit in Chicago in 2018--reviews digital transformation and what deep transformation really implies at the data layer. Cross-enterprise knowledge graphs are becoming feasible and can be a key enabler of deep transformation.
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
The complexity, volume and diversity of government policies and regulations raises significant burden on both the complying parties and government itself. On the one hand, businesses, civil organizations and other societal entities are required to simultaneously comply with and interpret different and possibly conflicting or inconsistent regulations. On the other hand, government as a whole must ensure policy and regulatory coherence across its various policy domains. While the recent wave of open government initiatives have led to significantly more public access to these documents, features allowing cross-referencing related documents and linking to less formal documents or comments on other media more understandable and accessible to the public are not common if at all available today. As a solution to this challenge, we propose an Open Government-wide Policy and Regulation Information Space consisting of documents that are “semantically” annotated and cross-linked to other documents in the information space as well as to external resources such as interpretations, comments and blogs on the social web.
Our approach is three-fold. First, we identify the requirements for the infrastructure. Second, we eloborate a Reference Architecture identifying the various elements needed within the infrastructure. Third, we show how such infrastructure may be realised as a linked data portal where policies and regulations are published as linked open data. Finally, we present a case study involving environmental policy and regulations; discuss the potential impact of such infrastructure on coherency and accessibility of policies and regulations and concludes with challenges associated with provisioning a linked open policy and regulatory information infrastructure.
Sokszor találkozol a digital thread kifejezéssel, de nem tudod hogy pontosan micsoda? Kíváncsi lenné rá, hogyan használják a cégek? Olvasd el prezentációnkat, amit előadtunk az idei Simonyi Konferencián!
International Data Spaces: Data Sovereignty and Interoperability for Business...Boris Otto
This presentation was held in a workshop session on IoT Business Models and Data Interoperability at the Max Planck Institute for Innovation and Competition in Munich on 8 October 2018. The presenation introduces the concept of business ecosystems and the role of data within the latter, then outlines the state of the art in terms of interoperability and sovereignty and finally sketches the IDS contribution.
Data Virtualization. An Introduction (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3uiXVoC
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Watch on-demand this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise? Where does it fit..?
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...Denodo
Watch: https://bit.ly/349QjYr
Currently, the most common Analytical Solutions are implemented on large scalable ecosystems which involve massive Data Lakes and Data Warehouses. These solutions take time to build and incur substantial TCO. In today’s environment we need rapid technologies, and NIIT has developed a compelling solution powered by Denodo’s Data Virtualization and Data Catalog.
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...Denodo
Watch full webinar here: https://bit.ly/3cbpipB
Uno de los sectores en los que la transformación digital está teniendo un efecto más disruptivo es el de la fabricación. Líderes del sector manufacturero están apostando por el Big Data, la computación en la nube, la inteligencia artificial y el Internet de las Cosas (IoT) entre otras tecnologías, además de contemplar la llegada de la 5G, con el fin de:
- Automatizar los procesos de manera eficiente, para permitir una mayor producción en menor tiempo
- Crear valor añadido en los productos manufacturados
- Conectar la planta industrial con el punto de venta
- Impulsar el análisis en tiempo real de datos provenientes de diferentes cadenas de producción
Sin embargo, para alcanzar estos objetivos y llevar a cabo esta revolución tecnológica, también conocida como industria 4.0, las manufacturas tienen que enfrentarse a una serie de desafíos no negligentes. El sector industrial es el que genera más datos en el mundo, y en la era digital, la velocidad, la diversidad y el volumen exponencial de los datos pueden superar las arquitecturas de TI tradicionales. Además, la mayoría de los fabricantes se enfrentan a silos de datos, lo que hace que su tratamiento sea lento y costoso. Necesitan entonces una plataforma de TI fiable que permita integrar, centralizar y analizar datos de distintas fuentes y diferentes formatos de manera ágil y segura para poner la información al servicio del negocio.
Los expertos de Enki y Denodo te proponen este seminario online para descubrir qué es la virtualización de datos, y por qué líderes del sector apuestan por esta tecnología innovadora para optimizar su estrategia de TI y conseguir un ROI significativo gracias a un acceso más rápido, simple y unificado a los datos industriales.
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosDenodo
Watch full webinar here: https://bit.ly/3w1LoDi
Os dados se tornaram o ativo mais crítico para qualquer empresa ter sucesso nesta era de transformação digital.
Nesta sessão, Paul Moxon da Denodo irá explicar como funciona a virtualização de dados e como pode ajudar as organizações a responder melhor às necessidades de negócios, integrando dados de várias fontes de dados, também minimizando custos e tempo, e aumentando a quantidade de dados disponibilizados em geral.
Para melhor compreensão, Mariana Pinto da Passio Consulting apresentará uma demonstração ao vivo da Plataforma Denodo.
This talk is about data-driven transformation and its contribution to Digital transformation. The first part shows the necessity to adopt the "software revolution" to adapt constantly to the customer’s environment. I then speak about " Exponential Information Systems" that the the foundation for the data-driven ambitions : Enterprise-wide flows, Customer-time data freshness, Future-proof unified semantics, etc.
The last part talks about Exponential Technologies, such as Artificial intelligence and machine learning, to drive more value from data
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
Watch full webinar here: https://bit.ly/2EpHGyd
Presented at Data Champions, Online Asia 2020
Businesses and individuals around the world are experiencing the impact of a global pandemic. With many workers and potential shoppers still sequestered, COVID-19 is proving to have a momentous impact on the global economy. Regardless of the current situation and post-pandemic era, real-time data becomes even more critical to healthcare practitioners, business owners, government officials, and the public at large where holistic and timely information are important to make quick decisions. It enables doctors to make quick decisions about where to focus the care, business owners to alter production schedules to meet the demand, government agencies to contain the epidemic, and the public to be informed about prevention.
In this on-demand session, you will learn about the capabilities of data virtualization as a modern data integration technique and how can organisations:
- Rapidly unify information from disparate data sources to make accurate decisions and analyse data in real-time
- Build a single engine for security that provides audit and control by geographies
- Accelerate delivery of insights from your advanced analytics project
Sotiris is currently working as Research Director with the Institute of Computer Science at the Foundation for Research and Technology - Hellas, where his research interests include systems, networks, and security. He is also a member of the European Union Agency for Network and Information Security (ENISA) Permanent Stakeholders Group! During Data Science Conference, Sotiris will talk about how data sharing between private companies and research facilities may lead to monetization.
Watch full webinar here: https://buff.ly/2XXbNB7
What started to evolve as the most agile and real-time enterprise data fabric, Data Virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
*What data virtualization really is
*How it differs from other enterprise data integration technologies
*Why data virtualization is finding enterprise wide deployment inside some of the largest organizations
Watch full webinar here: https://bit.ly/3puUCIc
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Watch on-demand this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise? Where does it fit?
One of the main goals of the I-BiDaaS project is to provide a Big Data as a self-service solution that will empower the actual employees of European companies in targeted sectors (banking, manufacturing, telecom), i.e., the true decision-makers, with the insights and tools they need in order to make the right decisions in an agile way. In this big data pilot webinar, we will demonstrate in a step by step fashion the I-BiDaaS self-service solution and its application to the banking sector. In more detail, we will present an overview of the I-BiDaaS project focusing on the requirements of the CaixaBank pilot study, the I-BiDaaS architecture with its core technologies, and a step by step demo of the I-BiDaaS solution. Last but not least, we will show through CaixaBank's success story how I-BiDaaS can resolve data availability, data sharing, and breaking silos challenges in the banking domain.
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesMultiscope
Roland Haeve is cross competence manager Big Data voor Atos Nederland. Roland heeft ruim 18 jaar ICT-ervaring in het aanbieden van complete oplossingen binnen onder andere Business Intelligence (BI) en Big Data (Analytics). Big Data is voor veel bedrijven nog pionieren en uitzoeken wat de mogelijkheden zijn. In zijn presentatie zal Roland ingaan op succesvolle Big Data cases. Hij zal hierbij niet enkel inzoomen op Nederland, maar ook bredere, Europese voorbeelden meenemen.
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Watch here: https://bit.ly/3i2iJbu
You will often hear that "data is the new gold". In this context, data management is one of the areas that has received more attention by the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
Join us for an exciting session that will cover:
- The most interesting trends in data management.
- Our predictions on how those trends will change the data management world.
- How these trends are shaping the future of data virtualization and our own software.
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/32c6TnG
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
- How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
- How you can use the Denodo Platform with large data volumes in an efficient way
- About the success McCormick has had as a result of seasoning the Machine Learning and Blockchain Landscape with data virtualization
Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
Watch here: https://bit.ly/2D1fqB6
Today’s evolving data landscape has spawned new business challenges that require innovative solutions. These challenges include:
- Strategic decision-making, which relies on multiple perspectives such as social and economic factors that require combining internal and external data.
- Accounting for the increased volume and structural complexity of today’s data, and increased frequency required in delivering data assets.
- Coping with data silos that house data that must be combined and provisioned to support decision-making.
- Exposing purpose-built analytics, such as supply chain, for consumption in order to expedite decision-making.
Attend this session to learn how Data as a Service, fueled by data virtualization, overcomes these common challenges from the three dimensions of:
- Provisioning information-rich external data assets,
- Connecting data silos, and
- Enabling pre-built and packaged analytics.
Similar to Marie-Aude Aufaure keynote ieee cist 2014 (20)
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
1. Challenges
and
opportuni1es
induced
by
Big
Data
and
Open
Data
for
Business
Intelligence
Keynote
@
IEEE
CIST’2014
Marie-‐Aude
AUFAURE
20/10/2014
IEEE
CIST
conference
2014
1
2. Agenda
• EvoluDon
of
business
intelligence
– SemanDc
Business
Intelligence
– Real-‐Time
Business
Intelligence
• Challenges
and
opportuniDes:
– Taking
into
account
unstructured
data
20/10/2014
IEEE
CIST
conference
2014
2
3. Business
Intelligence
• Business
Intelligence
(BI)
refers
to
a
set
of
tools
and
methods
dedicated
to
collecDng,
represenDng
and
analyzing
data
to
support
decision-‐making
in
enterprises.
• BI
is
defined
as
the
ability
for
an
organizaDon
to
take
all
input
data
and
convert
them
into
knowledge,
ulDmately,
providing
the
right
informaDon
to
the
right
people
at
the
right
Dme
via
the
right
channel.
20/10/2014
IEEE
CIST
conference
2014
3
4. EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL/Batch
processing
Structured/unstructured
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL/Batch
processing
databases
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDc
ETL
stream
processing
Load
shedding
sensors
Data
streams
Retro-‐
acDon
StaDc
data
20/10/2014
IEEE
CIST
conference
2014
4
6. Change
factors
• The
way
we
interact
together
and
with
data/
informaDon
20/10/2014
IEEE
CIST
conference
2014
6
7. BI
needs
to
focus
on:
• Being
simple
to
use
• Turning
any
data
into
informaDon/acDonable
knowledge
• Empowering
collabora1on
• Being
integrated
with
the
business
processes
20/10/2014
IEEE
CIST
conference
2014
7
8. EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL/Batch
processing
Structured/unstructured
data
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDc
ETL
stream
processing
Load
shedding
sensors
Data
streams
Retro-‐
acDon
StaDc
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL/Batch
processing
databases
20/10/2014
IEEE
CIST
conference
2014
8
9. And
now?
Big
Data
Open
Data
/Linked
Data
Connected
objects
20/10/2014
IEEE
CIST
conference
2014
9
10. Aspect
Characteris1cs
Challenges
and
technological
answers
Volume
More
visible
aspect
of
b i g
d a t a
b u t
l e s s
challenging
Storage
Virtualisa1on
in
data
centers,
generalizaDon
of
cloud-‐based
soluDons
NoSQL
Solu1ons
for
storing
and
querying
highly
distributed
data
Velocity
Data
produced
and
collected
in
a
shorter
Dme
window
Real-‐1me
Plateforms
Connected
objects
will
increase
volume
but
also
real-‐Dme
needs
Variety
MulDplicaDon
of
data
sources,
from
structured
data
to
free
text
New
data
stores
intégraDng
lexibles
data
models
Collect
and
analyze
unstructured
data
Value
More
subjecDve
aspect
dealing
withe
the
non
exploitaDon
of
these
massive
datasets
Transform
raw
data
into
valuable
informaDon
New
Business
models
20/10/2014
IEEE
CIST
conference
2014
10
11. Open
data
• An
open
data
is
a
digital
data
public
or
private
and
published
in
a
way
allowing
user
to
freely
access
and
reuse,
without
any
technical,
jridic
or
financial
restricDon.
• Examples
:
data
on
public
transportaDon,
cartography,
les
staDsDcs,
géography,
la
sociology,
environnement,
etc.
• Governemental
wave
in
the
2000:
– data.gov
project
in
2009,
USA
– European
DirecDve
in
2003
on
reuse
of
public
data
– In
France
Etalab
(2011)
is
in
charge
of
data.gouv.fr,
an
open
data
portail
for
public
data..
• Benefits
for
the
public
sector
:
– Transparency,
costs
reducDon,
beher
services
• Economic
benefits:
– Access
to
data,
mainly
for
SMEs
20/10/2014
IEEE
CIST
conference
2014
11
!
!!
13. More
and
more
connected
objects
20/10/2014
IEEE
CIST
conference
2014
13
14. Connected
Cars
• 200
Millions
véhicules
equiped
with
Android
Auto
or
Apple
Carplay
in
2020
• Emergency
call
• Eco-‐driving
• Autonomous
Véhicule
• Assistancy
• Towards
automaDc
driving
• 54
millions
vehicles
totally
or
parDally
automated
in
2035
(source:
HIS
AutomoDve/
Polk)
20/10/2014
IEEE
CIST
conference
2014
14
15. Big
Data
:
Challenges?
• Vector
of
innovaDon
– DisrupDve
technologies:
cloud,
internet
of
things,
AnalyDcs
– Open
InnovaDon
• Enhancement
of
producDvity,
services
and
compeDDvity
– Public
services,
«
sokware-‐intensive
»
companies
• Economic
impact
– Benefits
for
the
analysis
of
internal
and
external
data
– New
jobs
• Big
Data
Centres
of
excellence
(Hack/Reduce
in
Boston)
20/10/2014
IEEE
CIST
conference
2014
15
16. BIG
DATA:
SOCIETAL
CHALLENGES
• Big
Data
for
Society:
can
we
expect
a
posiDve
impact
on
society?
• Generate
acDonable
informaDon
that
can
be
used
to
idenDfy
needs,
provide
services,
and
predict
and
prevent
crisis
for
the
benefit
of
populaDons.
• Health
and
well-‐being,
environment,
energy,
climate
change,
etc.
20/10/2014
IEEE
CIST
conference
2014
16
17. BIG
DATA:
ENERGY
CHALLENGE
•
supercomputeurs
20/10/2014
IEEE
CIST
conference
2014
17
18. BIG
DATA:
TECHNOLOGICAL
CHALLENGES
• Data
storage
:
data
centers,
cloud
infrastructures,
noSQL
databases,
in-‐memory
databases
• Data
processing
:
supercomputers,
distributed
or
massively
parallel-‐compuDng
20/10/2014
IEEE
CIST
conference
2014
18
19. Some
scienDfic
challenges
• Big
data
analyDcs
• Context
management
• VisualizaDon
and
Human-‐Computer
Interfaces
• Algorthms
distribuDon
• CorrelaDons
and
causality
• Real-‐Dme
analysis
of
data
streams
• ValidaDon,
trust
20/10/2014
IEEE
CIST
conference
2014
19
20. Big
Data
value
chain
Source
:
InternaDonal
Working
Group
on
Data
ProtecDon
in
TelecommunicaDons
20/10/2014
IEEE
CIST
conference
2014
20
21. PotenDal
of
Big
Data
Analysis
• Adapt
and
enhance
services
and
processes
– TransportaDon
and
logisDc
– Online
EducaDon
– Job
seeking
– SenDment
analysis
and
customers/ciDzens
needs
– Enhancement
of
public
services
– E-‐markeDng
• OpDmize
performances
– Assist
decision-‐making
– Less
resources
consumpDon
– Fraud
detecDon
• Predict
and
prevent
– Health
– Needs
anDcipaDon
– Security
20/10/2014
IEEE
CIST
conference
2014
21
22. BIG
DATA:
USE
CASES
20/10/2014
IEEE
CIST
conference
2014
22
23. Big
Data
opportuniDes
Source:
Big
Data
opportuniDes
survey,
Unisphere
/
SAP,
May
2013.
20/10/2014
IEEE
CIST
conference
2014
23
24. PredicDve
analyDcs:
flu
trends
United
states
Flu
AcDvity
United
States
Data
Google
Flu
Trends
es1mate
20/10/2014
IEEE
CIST
conference
2014
24
25. 360-‐degree
view
of
the
customer
Why?
What?
Who?
When/ How?
Where?
OperaDonal
data
Behavioral
data
DescripDve
data
InteracDon
Contextual
data
data
20/10/2014
IEEE
CIST
conference
2014
25
26. Types
of
data
used
in
Big
Data
iniDaDves
Internal
data
Tradi,onal
sources
«
New
data
»
Source:
Big
Data
opportuniDes
survey,
Unisphere
/
SAP,
May
2013.
20/10/2014
IEEE
CIST
conference
2014
26
27. EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL
Batch
processing
Structured/unstructured
data
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
(
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDcETL
stream
processing
Load
shedding
sensors
Data
stream
Retro-‐
acDon
StaDc
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL
Batch
processing
databases
20/10/2014
IEEE
CIST
conference
2014
27
28. Coping
with
unstructured
data
SemanDc
BI
SemanDc
Technologies
for
Bi
Data
Social
Networks
20/10/2014
IEEE
CIST
conference
2014
28
29. Unstructured
data
analyDcs
process
Data
• Web
content
• Ontologies
• Social
data
• Logs
• Texts
• Pictures,
etc.
Collect
• Web
crawling
• Web
scraping
• API
(Twiher,
Google,
…)
• Clics
(logs)
• Crowdsourcing
(Mechanical
Turk)
ExtracDon
/
StructuraDon
• SemanDc
ETL
• Named
enDDes
• lexico-‐syntacDc
paherns
• Dependancy
trees
• N-‐grams
Analyze
• clustering
• Galois
larce
• Unsupervised
and
supervised
learning
20/10/2014
Séminaire
Big
Data
29
30. SEMANTIC
BI
AND
VISUAL
ANALYTICS:
THE
FP7
CUBIST
PROJECT
20/10/2014
IEEE
CIST
conference
2014
30
31. CUBIST:
Combining
and
UniDng
Business
Intelligence
with
SemanDc
Technologies
flexible
and
visual
queries
/
analyDcs
databases
Forums,
blogs
office
SemanDc
ETL
Office
docs
Triple
Store
Exploitable
Results
Seman1c
Business
Intelligence
Comprehensive
Informa1on
Access
Means
Advanced
Visual
Analy1cs
■
Searching,
exploring,
analyzing
data
■
qualitaDve
data
analysis
■ graph-‐based
visualizaDons
No
exis1ng
solu1ons
from
BI-‐vendors
Seman1cally
enriched
BI
■ using
a
triple
store
for
BI
■ using
ontologies
as
schema
Partly
addressed
by
BI-‐
or
ST-‐vendors
BI
over
both
structured
and
unstructured
data
■ text
analyDcs
■ linking
unstructured
and
structured
sources
Already
addressed/developed
by
BI-‐vendors
20/10/2014
IEEE
CIST
conference
2014
31
32. Formal
Concept
Analysis
32
• Formal
Concept
Analysis
is
a
method
used
for
invesDgaDng
and
processing
explicitely
given
informaDon
– An
analysis
of
data
– Structures
of
formal
abstracDons
of
concepts
of
human
thought
– Formal
emphasizes
that
the
concepts
are
mathemaDcal
objects,
rather
than
concepts
of
mind
– Formal
Concept
Analysis
help
to
draw
inferences,
to
group
objects,
and
hence
to
create
concepts
• Visual
representaDon
by
a
Hasse
Diagram
20/10/2014
IEEE
CIST
conference
2014
33. Charts,
Graphs,
FCA
for
BI:
A
Toy
Example
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Possible
Informa1on
Needs:
1) Show
me
the
count
of
people
for
a
given
skill
2) Show
me
the
skills
and
how
many
people
share
some
skills,
in
order
to
get
an
idea
on
how
strongly
skills
are
related
3) Show
me
the
skills
and
people
such
that
I
get
an
idea
of
the
distribuDon
of
skills
among
people
and
dependencies
between
skills
20/10/2014
IEEE
CIST
conference
2014
33
34. ConverDng
the
data
(analyDc
model)
Raw
Data
Bar
Chart
Data
CounDng
the
number
of
people
per
skill
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Graph
Data
FCA
Data
(Formal
Context)
CounDng
the
number
of
people
who
share
two
skills
20/10/2014
IEEE
CIST
conference
2014
34
35. Visualizing
the
data
Raw
Data
Bar
Chart
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Graph
FCA
Concept
La^ce
20/10/2014
IEEE
CIST
conference
2014
35
36. Some
InformaDon
which
can
be
read
off
Bar
Chart
Graph
FCA
la^ce
§ ST
and
FCA
are
the
skills
most
people
have
§ ETL
and
VIZ
are
the
skills
least
people
have
§ The
skills
FCA
and
ST
are
strongly
related
§ Because
the
link
between
them
is
strong
§ The
skills
FCA
and
IE
are
only
weakly
related
§ Because
the
link
between
them
is
weak
§ No
one
has
knowledge
on
both
FCA
and
ETL
§ Because
there
is
no
link
between
FCA
and
ETL
§ Owen,
Harriet
and
Gerald
have
exactly
the
same
skills
§ Because
they
belong
to
the
same
node
§ Whoever
is
skilled
in
ETL
is
skilled
in
BI,
too
§ Because
the
BI-‐node
is
above
the
ETL-‐node
§ Anja
has
more
skills
than
Ken,
and
Ken
has
more
skills
than
Ernst
§ Because
the
nodes
are
ordered
that
way
20/10/2014
IEEE
CIST
conference
2014
36
37. Comparison
Bar
Chart
Graph
FCA
la^ce
Ý Many
well-‐known
visualizaDons
Ý Good
(readable
and
comprehensible)
layouts
Ý Good
for
analyzing
numbers
Þ Loss
of
informaDon
(what
people)
Þ Misleading
for
overlapping
ahributes
(counDng
people
manifold)
Þ Not
uDlizing
relaDonships
between
enDDes
Ý AhracDve
visualizaDons
Ý (RelaDvely)
easy
to
understand
Ý UDlizing
and
showing
links
between
enDDes
(skills)
Þ Loss
of
informaDon
(what
people)
Þ Bad
for
analyzing
numbers
Þ Number
of
nodes
might
explode
Þ Finding
good
layout
is
unsolved
(nice
layout
in
example
is
accidenDal
and
has
been
manually
created)
Þ Unfamiliar
means
for
analyDcs
Þ Scalability
Þ Bad
for
analyzing
numbers
Ý No
loss
of
informaDon
Ý Meaningful
clusters
in
one
node
Ý Showing
dependencies
between
enDDes
(both
people
and
skills)
20/10/2014
IEEE
CIST
conference
2014
37
38. Which
visualizaDon
should
I
choose?
Remember
the
informa1on
needs
from
the
beginning
Show
me
the
skills
and
how
many
people
share
some
skills,
in
order
to
get
an
idea
on
how
strongly
skills
are
related
Show
me
the
skills
and
people
such
that
I
get
an
idea
of
the
distribuDon
of
skills
among
people
and
dependencies
between
skills
Show
me
the
count
of
people
for
a
given
skill
Conclusion
§ Each
visualizaDon
has
its
own
strengths
and
weaknesses
§ Each
type
of
visualizaDon
is
suited
for
a
specific
type
of
informaDon
needs
§ Thus
the
visualizaDons
are
complemenDng
§ Thus
future
BI
tools
should
provide
all
types
of
visualizaDons
20/10/2014
IEEE
CIST
conference
2014
38
40. Visual
AnalyDcs
• Visual
analyDcs
supports
human
judgment
by
means
of
visual
representaDons
and
interacDon
techniques
[Keim
et
al.
2001]
• “Overview
first,
zoom
and
filter,
then
details-‐
on-‐demand.”[Shneiderman,
1996]
• Visual
AnalyDcs
for
FCA
combines:
– TradiDonal
BI
operaDons
and
visualizaDons
– Concept
Larce
transformaDon
and
visualizaDon
20/10/2014
IEEE
CIST
conference
2014
40
41. FCA-‐based
Visual
AnalyDcs
41
• Idea:
Create
visual
analyDcs
for
large
contexts
– Context
reducDon
– Allow
visual
queries
through
selecDon
and
filtering
– Dynamic
visualizaDon
– Visual
exploraDon
becomes
a
navigaDon
problem
20/10/2014
IEEE
CIST
conference
2014
42. Cubix:
A
Visual
AnalyDcs
tool
for
FCA
42
• Combines
interac1ve
features
to
overcome
drawbacks
of
single
techniques
• Features
– VisualisaDons
– Dashboard
– Metrics
– Filtering
&
Search
– Clustering
– Tree-‐ExtracDon
Publica0on:
ICDM
2012
[Melo
et
al.]
live:
cubix.alwaysdata.com
20/10/2014
IEEE
CIST
conference
2014
43. Summary
of
VisualisaDons
Analysis
Task
Data
Visualisa1on
Co-‐occurence
analysis
Concept
Larce
Enhanced
Hasse
diagram
Exploratory
Hierarchical
analysis
Tree
from
the
concept
larce
Sunburst
Frequent
itemsets
analysis
Ahributes
and
objects
matrix
Concept
stacking
(matrix)
SimulaDon
parameters
analysis
MulD-‐valued
ahributes
Heatmap
larce
ImplicaDon
analysis
AssociaDon
Rules
Radial/Matrix
visualisaDon
for
AssociaDon
Rules
20/10/2014
IEEE
CIST
conference
2014
43
44. Coming
back
to
ease
of
use
• Cubix
was
experimented
on
three
use
cases
– The
workflow
(data
selecDon,
scaling,
filtering
and
analysis)
needed
to
be
simplified
• User
creaDon
of
AnalyDcs
– Leading
to
«
BI
as
a
service
»
• AutomaDc
recommendaDon
of
VisualizaDon
and
gadgets:
– Decision
tree
• Based
on
the
data
type
and
volume
– CollaboraDve
filtering
• Based
on
other
user’s
preferences
for
similar
datasets
– Supervised
Learning
methods
• Based
on
users
profile
and
history
20/10/2014
IEEE
CIST
conference
2014
44
45. Coping
with
big
data
for
FCA
• ReducDon
techniques
– Filtering
(support,
stability)
• Distributed
compuDng
of
concepts
• Mining
Formal
Concepts
over
data
streams
• Visual
AnalyDcs
– New
metaphors
for
large
data
– Data
overview
view:
dashboards
• Filtering
20/10/2014
IEEE
CIST
conference
2014
45
47. SemanDc
Technologies
for
Big
Data
• Data-‐driven
approaches
(structure
learning,
data
mining,
staDsDcal
approaches)
are
not
always
sufficient
to
find
all
correlaDons
among
parameters
• SemanDc
approaches
can
provide
complementary
informaDon:
–
Simplify
the
informaDon
integraDon
process
–
Provide
a
unified
metadata
layer
–
Discover
and
enrich
informaDon
–
Provide
a
unified
access
to
informaDon
20/10/2014
IEEE
CIST
conference
2014
47
48. SemanDc
processing
• helping
to
make
sense
of
large
or
complex
sets
of
data
without
being
supplied
with
any
knowledge
about
the
data
• Turning
any
data
into
informaDon/acDonable
knowledge
• Some
examples:
– NLP
technologies
– Data
Mining
– ArDficial
Intelligence
– ClassificaDon
– SemanDc
Search
20/10/2014
IEEE
CIST
conference
2014
48
49. SemanDc
technologies
/
SemanDc
Web
• "The
Seman0c
Web
is
an
extension
of
the
current
web
in
which
informa0on
is
given
well-‐defined
meaning,
beKer
enabling
computers
and
people
to
work
in
coopera0on.“
(Tim
Berners-‐Lee,
2001)
• Standards
include:
– a
flexible
data
model
(RDF)
– schema
and
ontology
languages
for
describing
concepts
and
relaDonships
(RDFS
and
OWL)
– a
query
language
(SPARQL)
• Use
of
semanDc
technologies
in
semanDc
processing
(e.g.
semanDc
search)
• Use
of
semanDc
technologies
for
storing
and
querying
data
(triple
store
and
SPARQL)
20/10/2014
IEEE
CIST
conference
2014
49
50. SemanDc
Data
AggregaDon
and
Linking
for
Big
Data
• Transforming
unstructu
red
content
into
a
structured
format
for
later
analysis
is
a
major
challenge.
• The
value
of
data
explodes
when
it
can
be
linked
with
other
data,
thus
data
integraDon
is
a
major
creator
of
value
• Data
aggregaDon
from
various
sources
can
establish
the
veracity
• SemanDc
technologies
are
a
way
of
addressing
variety
20/10/2014
IEEE
CIST
conference
2014
50
51. Linked
Data
/
Web
of
Data
• Linked
Data
is
a
set
of
principles
that
allows
publishing,
querying
and
consump1on
of
RDF
data,
distributed
across
different
servers
• Not
necessarily
free
/
open
data
• ExponenDal
growth
-‐>
a
Big
Data
approach:
enriching
Big
Data
with
metadata
&
semanDcs,
interlinking
Big
Data
sets
• PricewaterhouseCoopers,
2009:
«
You’ll
be
able
to
find
pieces
of
data
sets
from
different
places,
aggregate
them
without
warehousing,
and
analyse
them
in
a
more
straighSorward,
powerful
way
»
20/10/2014
IEEE
CIST
conference
2014
51
52. SemanDc
Technologies
for
Big
Data
• Natural
Language
Processing
(NLP)
• Ontology
Engineering
techniques
• SemanDc
enrichment:
– AddiDon
of
contextual
informaDon
– SemanDc
annotaDon
– Data
categorizaDon
/
classificaDon
– Improved
informaDon
retrieval
– Reasoning
20/10/2014
IEEE
CIST
conference
2014
52
53. SemanDc
Data
AggregaDng
and
Linking
for
Big
Data
Ontologies
Linked Open Data
Linked Open Data
Structured Non-structured
LAYER
Documents
DATA Web pages
Sensor data
Textual content Social Media
KNOWLEDGE LAYER
SemanDc
aggregaDon
SemanDc
Enrichment
and
disambiguaDon
Linking
data
Database
20/10/2014
IEEE
CIST
conference
2014
53
55. Pahern-‐based
Technique
Query
=“Olive
Garden"+“Darden
Rest"
The
first
owner
of
[Olive
Garden]
was
the
famous
[Darden
Rest]VAL
20/10/2014
IEEE
CIST
conference
2014
55
57. Value
of
SemanDc
Technologies
• SemanDc
Technologies
provide
opportuniDes
for
reducing
the
cost
and
complexity
of
data
integraDon
• Common
metadata
layer
• Powerful
soluDons
to
find
and
explore
informaDon
• SemanDc
Technologies
are
a
good
fit
for
Big
Data’s
Variety
• Velocity
and
Volume:
challenging
issues
for
SemanDc
Technologies
• Linked
Data
will
grow
into
Big
Linked
Data,
but
Big
Data
will
also
benefit
from
evolving
into
Linked
Big
Data
20/10/2014
IEEE
CIST
conference
2014
57
59. Graphs
everywhere
IEEE
CIST
conference
2014
59
- Social networks
- Web
- Enterprise databases
- Biology
- Etc.
20/10/2014
Simple
management
of
structured,
semi-‐structured
and
unstructured
informaDon
Rela1onal
databases
XML Web
60. Graphs:
what
can
we
do
with?
• Traversing
linked
informaDon,
finding
shortest
path,
doing
(semanDc)
parDDon
• RecommendaDon
and
discovery
of
potenDally
interesDng
linked
informaDon
• Exploit
the
graph
structure
of
large
repositories
– Web
environment
– Digital
documents
repositories
– Databases
with
metadata
• Use
cases
:
recommendaDon,
social
networks
IEEE
CIST
20/10/2014
conference
2014
60
61. Graphs
for
Social
networks:
enterprises
use
case
• A
technology
for
internal
communicaDon,
informaDon
sharing
and
collaboraDon
• A
technology
for
informaDon
communicaDon
towards
clients
– Vote
for
the
best
product,
– Understand
the
clients
needs
• A
technology
for
watching
the
gossip
– E-‐reputaDon,
opinion
mining
• A
technology
for
creaDng
collecDve
intelligence
– CollaboraDve
common
knowledge
– Wikis
and
blogs
associated
to
social
networks
20/10/2014
IEEE
CIST
conference
2014
61
62. Graphs
for
Social
networks:
public
administraDons
use
case
• Public
administraDons
need
social
networks:
– As
enterprises:
• To
analyze
internal
networks
(projects,
organizaDon…)
• To
analyze
external
networks
(suppliers,
clients,
partners…)
– As
an
interface
for
ciDzens:
• To
be
well-‐understood
by
ciDzens
(who
does
what)
• To
understand
ciDzens
(who
says
what)
• Scenarios
examples:
– Need
to
look
over
the
organizaDonal
structure
(employees,
departments,
transversal
projects)
and
idenDfy
costs
– Need
for
ciDzens
to
understand
the
impact
of
public
poliDcs
(offered
services,
available
resources
for
each
district
of
the
city,
which
projects
are
the
most
relevant,
ciDzens
complains)
– Opinion
analysis
from
external
social
networks
(Twiher
for
example)
20/10/2014
IEEE
CIST
conference
2014
62
63. Social
web
–
Social
Networks
• The
Social
SemanDc
Web
combines
technologies,
strategies
and
methodologies
from
the
SemanDc
Web,
social
sokware
and
the
Web
2.0.
• Web
2.0
allows
users
to
express
their
opinion
on
products
and
services
• Understanding
“what
people
think”
can
support
decision-‐making,
both
for
consumers
and
producers
20/10/2014
IEEE
CIST
conference
2014
63
64. SenDment
Analysis
–
Opinion
mining
Find
out
what
other
people
think.
Is
it
possible?
What does it mean opinion mining?
The beginning of wisdom is the definition of terms! (socrates)
Today, vendors, practitioners, and the media alike call this still-nascent arena everything from
‘brand monitoring,’ ‘buzz monitoring’ and ‘online anthropology,’ to ‘market influence analytics,’
‘conversation mining’ and ‘online consumer intelligence’. . . . In the end, the term ‘social media
monitoring and analysis’ is itself a verbal crutch. It is placeholder [sic], to be used until
something better (and shorter) takes hold in the English language to describe the topic of this
report.
Zabin and Jefferies: “Social media monitoring and analysis: Generating
consumer insights from online conversation,”
20/10/2014
IEEE
CIST
conference
2014
64
65. Opinion
mining
–
possible
uses
Recommender systems (avoid recommending items that received a lot
of negative feedback).
Information Filtering
Business Intelligence (why aren’t consumers buying my laptop?).
Question answering (what did you want to say?)
Clarification of politicians positions!
eDemocracy…and so on
20/10/2014
IEEE
CIST
conference
2014
65
66. Opinion
mining
–
Sociology
who is positively or negatively disposed toward whom
Who would be more or less receptive to new information transmission
from a given source.
Structural balance theory: group cohesion and overall polarity among
people.
20/10/2014
IEEE
CIST
conference
2014
66
67. Opinion mining – The perfect tool
The development of a complete opinion-search application might involve
1) Determine which documents or portions of documents contain
opinionated material.
2) Identify the overall sentiment expressed by these documents and/
or the specific opinions regarding particular features or aspects of the
items or topics in question, as necessary.
3) Finally, the system needs to present the sentiment information
it has garnered in some reasonable summary fashion (aggregation
of “votes”, selective highlighting of some opinions, etc)
68. Opinion
mining
–
Polarity
A basic task in sentiment analysis is classifying the polarity of a given
text at the document, sentence, or feature/aspect level — whether
the expressed opinion in a document, a sentence or an entity feature/
aspect is positive, negative, or neutral.
A polarity is a real number quantifying the user’s positive, negative or
neutral opinion.
20/10/2014
IEEE
CIST
conference
2014
68
69. DetecDng
feature
senDment
in
user-‐
generated
reviews
It is not possible to summarize everything with a unique vote/
polarity ⇒ detect local polarities expressed about the salient
features of a considered domain.
Extract the most frequent domain-related features
Good
LocaDon,
Terrible
Food:
DetecDng
Feature
SenDment
in
User-‐Generated
Review
Cataldi
et
al,
2013
-‐
SNAM
20/10/2014
IEEE
CIST
conference
2014
69
70. Combining
staDsDcs
and
NLP
1) We
idenDfy
the
most
characterizing
aspects
of
one
domain
(hotels,
restaurant,
products)
by
analyzing
the
domain
corpus
and
extracDng
the
most
frequent
terms
(eventually
structuring
them
as
a
vocabulary
and/or
ontology)
2) We
formalize
the
content
of
each
review
as
a
dependency
tree
among
its
terms
and
retrieve
(if
they
exist)
the
features
discussed
within
it.
Then,
by
using
the
tree,
we
aim
at
discovering
all
the
other
terms
that
vehiculate
some
polarity
linguisDcally
connected
to
them.
20/10/2014
IEEE
CIST
conference
2014
70
71. E R
V
1 ,i φ
…
n i, φ 2 , i φ
Feature
Extractor
Raw
text
POS-‐
tagging
τ
Linguis1c
Parser
feature1
feature3
feature2
feature4
F
ranking
synset
WordNet
term
pos.
polar
neg.
polar
Synset
Polarity
computa1on
Subset
of
features
i F
in
G
feature1
Polarity
for
feature1
Sen1ment
Computa1on
Phrase
Structure
English
Corpus
Dep.
Graph
G
Feature
Set
Dep.
Graph
G
synset1
synset2
Synsets
in
G,
carrying
some
sen0ment,
referred
to
a
feature
in
i F
20/10/2014
IEEE
CIST
conference
2014
71
72. Graphs
and
social
networks
• Can
be
useful
for
many
applicaDons:
– E-‐reputaDon
and
trust
management
– Monitoring
of
social
networks
for
security
– RecommendaDon
of
corporate
data/informaDon
– Retail
Is
TwiKer
just
a
mirror
of
mass
sen0ment
or
is
it
also
able
to
influence
opinion
?
20/10/2014
IEEE
CIST
conference
2014
72
73. Conclusion
• Many
models
should
be
combined:
– Ontologies,
graphs,
formal
concepts,
predicDve
models
• Many
techniques
should
be
combined:
– Natural
language
processing
– Machine
learning
and
staDsDcs
– Ontology
engineering,
Linked
Data
Management
– Graphs
processing
– VisualizaDon
– Crowdsourcing,
scrapping
• For
SemanDc
Enrichment
20/10/2014
IEEE
CIST
conference
2014
73
74. Challenges
• SemanDc
InformaDon
aggregaDon
– Pahern
extracDon
from
streams
and
cross-‐analysis
– InformaDon
extracDon
from
Linked
Open
Data:
concepts
and
relaDons
linked
to
the
streams
paherns
– Opinion
aggregaDon
from
social
media
and
web
– Social
aspects
for
collaboraDon
– InformaDon
aggregaDon:
“too
much
data
to
assimilate
but
not
enough
knowledge
to
act”
• Distributed
and
real-‐Dme
processing
– Design
of
real-‐Dme
and
distributed
algorithms
for
stream
processing
and
informaDon
aggregaDon
– Storage
and
indexaDon
of
a
knowledge
base
– IntegraDon
of
business
processes
with
aggregated
informaDon
– DistribuDon
and
parallelizaDon
of
data
mining
algorithms
• visual
analyDcs
and
user
modeling
– Dynamic
user
model
– Novel
visualizaDons
for
very
large
datasets
20/10/2014
IEEE
CIST
conference
2014
74