Knowledge extraction and incorporation is currently considered to be beneficial for efficient Big Data analytics. Knowledge can take part in workflow design, constraint definition, parameter selection and configuration, human interactive and decision-making strategies. Here we present BIGOWL, an ontology to support knowledge management in Big Data analytics. BIGOWL is designed to cover a wide vocabulary of terms concerning Big Data analytics workflows, including their components and how they are connected, from data sources to the analytics visualization. It also takes into consideration aspects such as parameters, restrictions and formats. This ontology defines not only the taxonomic relationships between the different concepts, but also instances representing specific individuals to guide the users in the design of Big Data analytics workflows. For testing purposes, two case studies are developed, which consists in: first, real-world streaming processing with Spark of traffic Open Data, for route optimization in urban environment of New York city; and second, data mining classification of an academic dataset on local/cloud platforms. The analytics workflows resulting from the BIGOWL semantic model are validated and successfully evaluated.
This document provides an introduction to big data and analytics. It discusses the topics of data processing, big data, data science, and analytics and optimization. It then provides a historic perspective on data and describes the data processing lifecycle. It discusses aspects of data including metadata and master data. It also discusses different data scenarios and the processing of data in serial versus parallel formats. Finally, it discusses the skills needed for a data scientist including business and domain knowledge, statistical modeling, technology stacks, and more.
This document discusses techniques for pre-processing big data to improve the quality of analysis. It covers exploring and cleaning data by handling missing values, reducing noise, and reducing dimensions. Data transformation techniques are also discussed, such as standardizing, aggregating, and joining data. Finally, the document emphasizes that data preparation is a key factor in model quality and generating insights from trusted data.
This document discusses two case studies of organizations that partnered with Synoptek to improve their IT services and operations. The first case study was of a women's healthcare organization that wanted better infrastructure availability, performance, and security. With Synoptek's help, they reduced costs by 20%, improved IT performance, security, and availability. The second case study was of a community college that wanted to transform programs, optimize operations, and engage students. Synoptek helped them deliver on these goals through managed IT services and support.
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
Join Oracle NoSQL DB and InfiniteGraph development teams in a discussion of the latest trends in Big Data and Graph Technology. Learn what Oracle’s view of Big Data is and how Oracle NoSQL Database technologies enable you to manage vast amounts of real-time key-value data.
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...BAINIDA
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
On Big Data Analytics - opportunities and challengesPetteri Alahuhta
This document discusses big data analytics and its opportunities and challenges. It defines big data and explains the increasing number of "V's" that characterize big data, such as volume, velocity, variety, and veracity. It also outlines some common uses of big data analytics including customer insights, security and risk analysis, and resource optimization. Additionally, it discusses challenges of big data adoption like skills shortages and infrastructure limitations, as well as trends in big data and areas of expertise related to big data that VTT focuses on.
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
Watch full webinar here: https://bit.ly/3Ab9gYq
Imagina llegar a un parque de atracciones con tu familia y comenzar tu día sin el típico plano que te permitirá planificarte para saber qué espectáculos ver, a qué atracciones ir, donde pueden o no pueden montar los niños… Posiblemente, no podrás sacar el máximo partido a tu día y te habrás perdido muchas cosas. Hay personas que les gusta ir a la aventura e ir descubriendo poco a poco, pero cuando hablamos de negocios, ir a la aventura puede ser fatídico...
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de esa información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos, herramienta estratégica para implementar y optimizar el gobierno del dato, permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
En este webinar aprenderás a:
- Acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
This document summarizes a webinar on building smart cities. It discusses using semantic technologies like ontologies, taxonomies, and knowledge graphs to build smart city platforms and applications. Speakers from Semantic Web Company and Findwise discuss semantic data integration, case studies of semantic platforms for healthcare information in Australia and smart city data in Gothenburg, and tools for building semantic solutions like the PoolParty Semantic Suite. The webinar covers challenges in building smart cities and how semantic technologies can help with areas like data modeling, integration, and machine learning on city data. It concludes with a Q&A session.
This document provides an introduction to big data and analytics. It discusses the topics of data processing, big data, data science, and analytics and optimization. It then provides a historic perspective on data and describes the data processing lifecycle. It discusses aspects of data including metadata and master data. It also discusses different data scenarios and the processing of data in serial versus parallel formats. Finally, it discusses the skills needed for a data scientist including business and domain knowledge, statistical modeling, technology stacks, and more.
This document discusses techniques for pre-processing big data to improve the quality of analysis. It covers exploring and cleaning data by handling missing values, reducing noise, and reducing dimensions. Data transformation techniques are also discussed, such as standardizing, aggregating, and joining data. Finally, the document emphasizes that data preparation is a key factor in model quality and generating insights from trusted data.
This document discusses two case studies of organizations that partnered with Synoptek to improve their IT services and operations. The first case study was of a women's healthcare organization that wanted better infrastructure availability, performance, and security. With Synoptek's help, they reduced costs by 20%, improved IT performance, security, and availability. The second case study was of a community college that wanted to transform programs, optimize operations, and engage students. Synoptek helped them deliver on these goals through managed IT services and support.
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
Join Oracle NoSQL DB and InfiniteGraph development teams in a discussion of the latest trends in Big Data and Graph Technology. Learn what Oracle’s view of Big Data is and how Oracle NoSQL Database technologies enable you to manage vast amounts of real-time key-value data.
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...BAINIDA
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
On Big Data Analytics - opportunities and challengesPetteri Alahuhta
This document discusses big data analytics and its opportunities and challenges. It defines big data and explains the increasing number of "V's" that characterize big data, such as volume, velocity, variety, and veracity. It also outlines some common uses of big data analytics including customer insights, security and risk analysis, and resource optimization. Additionally, it discusses challenges of big data adoption like skills shortages and infrastructure limitations, as well as trends in big data and areas of expertise related to big data that VTT focuses on.
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
Watch full webinar here: https://bit.ly/3Ab9gYq
Imagina llegar a un parque de atracciones con tu familia y comenzar tu día sin el típico plano que te permitirá planificarte para saber qué espectáculos ver, a qué atracciones ir, donde pueden o no pueden montar los niños… Posiblemente, no podrás sacar el máximo partido a tu día y te habrás perdido muchas cosas. Hay personas que les gusta ir a la aventura e ir descubriendo poco a poco, pero cuando hablamos de negocios, ir a la aventura puede ser fatídico...
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de esa información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos, herramienta estratégica para implementar y optimizar el gobierno del dato, permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
En este webinar aprenderás a:
- Acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
This document summarizes a webinar on building smart cities. It discusses using semantic technologies like ontologies, taxonomies, and knowledge graphs to build smart city platforms and applications. Speakers from Semantic Web Company and Findwise discuss semantic data integration, case studies of semantic platforms for healthcare information in Australia and smart city data in Gothenburg, and tools for building semantic solutions like the PoolParty Semantic Suite. The webinar covers challenges in building smart cities and how semantic technologies can help with areas like data modeling, integration, and machine learning on city data. It concludes with a Q&A session.
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
1. The documents discuss methodology, quality, and IT aspects of big data within the ESSnet Big Data project.
2. Key topics addressed include the big data processing lifecycle, metadata management challenges, and quality aspects like coverage, accuracy, and comparability over time.
3. Common themes that emerged across work packages include the need for a unified framework for data integration and metadata, and the value of shared software and training resources.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
This document provides an overview of big data and big data analytics. It defines big data as large, complex datasets that grow quickly in volume and variety. Big data analytics involves examining these large datasets to find patterns and useful information. The challenges of big data include increased storage needs and handling diverse data formats. Hadoop is a framework that allows distributed processing of big data across clusters of computers. Common big data analytics tools include MapReduce, Spark, HBase and Hive. The benefits of big data analytics include improved decision making, customer service and efficiency.
Big Data Analytics and Hadoop is presented. Key points include:
- Big data is large and complex data that is difficult to process using traditional methods. Domains that produce large datasets include meteorology, physics simulations, and internet search.
- The four V's of big data are volume, velocity, variety, and veracity. Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers. Its core components are HDFS for storage and MapReduce for processing.
- Apache Hadoop has gained popularity for big data analytics due to its ability to process large amounts of data in parallel using commodity hardware, its scalability, and automatic failover. A Hadoop ecosystem of
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...VMware Tanzu
This document discusses using Dataiku as an end-to-end enterprise AI platform to drive data science at scale using PostgreSQL, Greenplum, and Dataiku. It highlights how Dataiku allows business analysts, data engineers, and data scientists to collaborate on a single platform for data preparation, machine learning modeling, and model deployment. It also provides examples of how customers like a major software company have leveraged Dataiku to automate the deployment of over 12,000 predictive models.
This document outlines Austria's strategy for conquering data through research and technology initiatives (RTI). It discusses:
1) The role of the Austrian Federal Ministry for Transport, Innovation and Technology (bmvit) in funding RTI, including a €450M annual budget and focus areas like ICT, production, energy and mobility.
2) Studies conducted on topics like the technology roadmap for conquering data through intelligent systems.
3) Actions already taken and planned, including developing lead technologies for data integration and fusion, increasing algorithm efficiency, and establishing a data-services ecosystem in Austria.
4) The need for coordination across stakeholders, developing the necessary legal framework, and ensuring resources and competencies
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
F.A.I.R Data Principles with Knowledge Graphs & AI. Challenges and opportunities with emerging new technologies and paradigm shift of information management and data governance.
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
A presentation from ODTUG 2013 on tools other than OBIEE for Exalytics, focusing on analysis of non-traditional data via Endeca, "big data" via Hadoop and statistical analysis / predictive modeling through Oracle R Enterprise, and the benefits of running these tools on Oracle Exalytics
How to crack Big Data and Data Science rolesUpXAcademy
How to crack Big Data and Data Science roles is the flagship event of UpX Academy. This slide was used for the event on 10th Sept that was attended by hundreds of participants globally.
This document provides an overview of Anzo Unstructured, a natural language processing (NLP) platform from Cambridge Semantics. It discusses the core capabilities of Anzo Unstructured, including intake of various file formats, extraction of entities and relationships, and semantic analysis. It also outlines example use cases in pharma and finance. The document demonstrates the configuration and visualization of Anzo Unstructured pipelines and annotations.
1) The document proposes FAME.Q, a formal approach to assess data quality in enterprise linked data. It defines data quality as the degree to which data meets specific requirements.
2) FAME.Q assesses data quality at the instance, schema, and service levels using common quality dimensions and appropriate metrics. It represents data quality levels as percentages to indicate how "fatal" or "perfect" the data is.
3) The approach builds on existing data quality definitions, applies them to semantic web data, and represents the assessments in a formalized schema. It outputs results using the data quality vocabulary.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Hadoop is a Java framework for managing large datasets distributed across clusters of commodity hardware. It allows for the distributed processing of large datasets across clusters of computers using simple programming models. Hadoop features distributed storage and processing of data and is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It provides reliable, scalable, and distributed computing and storage for big data applications.
Presentation at ISKO-UK Linked Data: The Future of Knowledge Organization on the Web conference. More at http://www.iskouk.org/events/linked_data_sep2010.htm
Retail banks are moving beyond the data warehouse and data lake and are now implementing data fabric architectures to address data discovery and integration challenges.
These are the slides from our webinar "Modern Data Discovery and Integration in Retail Banking" in which we explore the role of the data discovery and integration layer in a data fabric with special focus on evolution from data warehouse to data fabric, semantics and graph data models in data fabric and example use cases in retail banks and B2C financial services.
Watch full webinar on demand here: https://goo.gl/yqzxUP
Market research shows that around 70% of the self-service initiatives fare “average” or below. Denodo 7.0 information self-service tool will offer data analysts, business users and app developers searching and browsing capability of data and metadata in a business friendly manner for self-service exploration and analytics.
Attend this session to learn:
- How business users will be able to use Denodo Platform integrated google-like search for both content and catalog
- With web based query UI how business users can refine queries without SQL knowledge
- With tags and business categorization, how to standardize business / canonical views while decoupling development artifacts from the business users
Agenda:
The role of information self-service tool
Product demonstration
Summary & Next Steps
Q&A
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
This document provides an introduction and overview of big data technologies. It begins with defining big data and its key characteristics of volume, variety and velocity. It discusses how data has exploded in recent years and examples of large scale data sources. It then covers popular big data tools and technologies like Hadoop and MapReduce. The document discusses how to get started with big data and learning related skills. Finally, it provides examples of big data projects and discusses the objectives and benefits of working with big data.
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
1. The documents discuss methodology, quality, and IT aspects of big data within the ESSnet Big Data project.
2. Key topics addressed include the big data processing lifecycle, metadata management challenges, and quality aspects like coverage, accuracy, and comparability over time.
3. Common themes that emerged across work packages include the need for a unified framework for data integration and metadata, and the value of shared software and training resources.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
This document provides an overview of big data and big data analytics. It defines big data as large, complex datasets that grow quickly in volume and variety. Big data analytics involves examining these large datasets to find patterns and useful information. The challenges of big data include increased storage needs and handling diverse data formats. Hadoop is a framework that allows distributed processing of big data across clusters of computers. Common big data analytics tools include MapReduce, Spark, HBase and Hive. The benefits of big data analytics include improved decision making, customer service and efficiency.
Big Data Analytics and Hadoop is presented. Key points include:
- Big data is large and complex data that is difficult to process using traditional methods. Domains that produce large datasets include meteorology, physics simulations, and internet search.
- The four V's of big data are volume, velocity, variety, and veracity. Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers. Its core components are HDFS for storage and MapReduce for processing.
- Apache Hadoop has gained popularity for big data analytics due to its ability to process large amounts of data in parallel using commodity hardware, its scalability, and automatic failover. A Hadoop ecosystem of
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...VMware Tanzu
This document discusses using Dataiku as an end-to-end enterprise AI platform to drive data science at scale using PostgreSQL, Greenplum, and Dataiku. It highlights how Dataiku allows business analysts, data engineers, and data scientists to collaborate on a single platform for data preparation, machine learning modeling, and model deployment. It also provides examples of how customers like a major software company have leveraged Dataiku to automate the deployment of over 12,000 predictive models.
This document outlines Austria's strategy for conquering data through research and technology initiatives (RTI). It discusses:
1) The role of the Austrian Federal Ministry for Transport, Innovation and Technology (bmvit) in funding RTI, including a €450M annual budget and focus areas like ICT, production, energy and mobility.
2) Studies conducted on topics like the technology roadmap for conquering data through intelligent systems.
3) Actions already taken and planned, including developing lead technologies for data integration and fusion, increasing algorithm efficiency, and establishing a data-services ecosystem in Austria.
4) The need for coordination across stakeholders, developing the necessary legal framework, and ensuring resources and competencies
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
F.A.I.R Data Principles with Knowledge Graphs & AI. Challenges and opportunities with emerging new technologies and paradigm shift of information management and data governance.
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
A presentation from ODTUG 2013 on tools other than OBIEE for Exalytics, focusing on analysis of non-traditional data via Endeca, "big data" via Hadoop and statistical analysis / predictive modeling through Oracle R Enterprise, and the benefits of running these tools on Oracle Exalytics
How to crack Big Data and Data Science rolesUpXAcademy
How to crack Big Data and Data Science roles is the flagship event of UpX Academy. This slide was used for the event on 10th Sept that was attended by hundreds of participants globally.
This document provides an overview of Anzo Unstructured, a natural language processing (NLP) platform from Cambridge Semantics. It discusses the core capabilities of Anzo Unstructured, including intake of various file formats, extraction of entities and relationships, and semantic analysis. It also outlines example use cases in pharma and finance. The document demonstrates the configuration and visualization of Anzo Unstructured pipelines and annotations.
1) The document proposes FAME.Q, a formal approach to assess data quality in enterprise linked data. It defines data quality as the degree to which data meets specific requirements.
2) FAME.Q assesses data quality at the instance, schema, and service levels using common quality dimensions and appropriate metrics. It represents data quality levels as percentages to indicate how "fatal" or "perfect" the data is.
3) The approach builds on existing data quality definitions, applies them to semantic web data, and represents the assessments in a formalized schema. It outputs results using the data quality vocabulary.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Hadoop is a Java framework for managing large datasets distributed across clusters of commodity hardware. It allows for the distributed processing of large datasets across clusters of computers using simple programming models. Hadoop features distributed storage and processing of data and is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It provides reliable, scalable, and distributed computing and storage for big data applications.
Presentation at ISKO-UK Linked Data: The Future of Knowledge Organization on the Web conference. More at http://www.iskouk.org/events/linked_data_sep2010.htm
Retail banks are moving beyond the data warehouse and data lake and are now implementing data fabric architectures to address data discovery and integration challenges.
These are the slides from our webinar "Modern Data Discovery and Integration in Retail Banking" in which we explore the role of the data discovery and integration layer in a data fabric with special focus on evolution from data warehouse to data fabric, semantics and graph data models in data fabric and example use cases in retail banks and B2C financial services.
Watch full webinar on demand here: https://goo.gl/yqzxUP
Market research shows that around 70% of the self-service initiatives fare “average” or below. Denodo 7.0 information self-service tool will offer data analysts, business users and app developers searching and browsing capability of data and metadata in a business friendly manner for self-service exploration and analytics.
Attend this session to learn:
- How business users will be able to use Denodo Platform integrated google-like search for both content and catalog
- With web based query UI how business users can refine queries without SQL knowledge
- With tags and business categorization, how to standardize business / canonical views while decoupling development artifacts from the business users
Agenda:
The role of information self-service tool
Product demonstration
Summary & Next Steps
Q&A
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
This document provides an introduction and overview of big data technologies. It begins with defining big data and its key characteristics of volume, variety and velocity. It discusses how data has exploded in recent years and examples of large scale data sources. It then covers popular big data tools and technologies like Hadoop and MapReduce. The document discusses how to get started with big data and learning related skills. Finally, it provides examples of big data projects and discusses the objectives and benefits of working with big data.
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
This document discusses big data, defining it as large volumes of diverse data that are growing rapidly and requiring new techniques to capture, curate, manage, and analyze. It covers the key characteristics of big data including volume, velocity, and variety. The document also outlines common sources of big data, tools used to manage and analyze it, applications of big data analytics, risks and benefits, and the future growth of big data.
Watch here: https://bit.ly/3i2iJbu
You will often hear that "data is the new gold". In this context, data management is one of the areas that has received more attention by the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
Join us for an exciting session that will cover:
- The most interesting trends in data management.
- Our predictions on how those trends will change the data management world.
- How these trends are shaping the future of data virtualization and our own software.
Enabling data scientists within an enterprise requires a well-thought out approach from an organization, technology, and business results perspective. In this talk, Tim and Hussain will share common pitfalls to data science enablement in the enterprise and provide their recommendations to avoid them. Taking an example, actionable use case from the financial services industry, they will focus on how Anaconda plays a pivotal role in setting up big data infrastructure, integrating data science experimentation and production environments, and deploying insights to production. Along the way, they will highlight opportunities for leveraging open source and unleashing data science teams while meeting regulatory and compliance challenges.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
This document provides an overview of big data, including its definition, characteristics, sources, tools used, applications, benefits, and impact on IT. Big data is a term used to describe the large volumes of data, both structured and unstructured, that are so large they are difficult to process using traditional database and software techniques. It is characterized by high volume, velocity, variety, and veracity. Common sources of big data include mobile devices, sensors, social media, and software/application logs. Tools like Hadoop, MongoDB, and MapReduce are used to store, process, and analyze big data. Key applications areas include homeland security, healthcare, manufacturing, and financial trading. Benefits include better decision making, cost reductions
This document discusses trends in data analytics. It begins by defining big data and how it differs from traditional data approaches in terms of size, techniques, and ability to solve new problems. It then provides examples of big data applications across various industries like retail, automotive, healthcare, and insurance. Specifically, it outlines how big data is used for predictive analytics, personalization, fraud detection, and risk adjustment. Finally, it discusses some risks of big data like privacy issues and ensuring the right problems are addressed.
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
There was a time when the Enterprise Data Warehouse (EDW) was the only way to provide a 360-degree analytical view of the business. In recent years many organizations have deployed disparate analytics alternatives to the EDW, including: cloud data warehouses, machine learning frameworks, graph databases, geospatial tools, and other technologies. Often these new deployments have resulted in the creation of analytical silos that are too complex to integrate, seriously limiting global insights and innovation.
Join guest speaker, 451 Research’s Jim Curtis and Pivotal’s Jacque Istok for an interactive discussion about some of the overarching trends affecting the data warehousing market, as well as how to build a next generation data platform to accelerate business innovation. During this webinar you will learn:
- The significance of a multi-cloud, infrastructure-agnostic analytics
- What is working and what isn’t, when it comes to analytics integration
- The importance of seamlessly integrating all your analytics in one platform
- How to innovate faster, taking advantage of open source and agile software
Speakers: James Curtis, Senior Analyst, Data Platforms & Analytics, 451 Research & Jacque Istok, Head of Data, Pivotal
This document summarizes a talk on using big data driven solutions to combat COVID-19. It discusses how big data preparation involves ingesting, cleansing, and enriching data from various sources. It also describes common big data technologies used for storage, mining, analytics and visualization including Hadoop, Presto, Kafka and Tableau. Finally, it provides examples of research projects applying big data and AI to track COVID-19 cases, model disease spread, and optimize health resource utilization.
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
This document discusses big data and semantic web technologies in manufacturing. It outlines how big data is being used in manufacturing for applications like product quality tracking, supply chain management, and forecasting. Semantic web technologies like ontologies and semantic layers are discussed as ways to add meaning to manufacturing data and integrate information from different systems. A case study on using an ontology called ECOS and a text mining tool called TEXT2RDF to extract semantics from manufacturing documents is presented.
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Denodo
Watch full webinar here: https://bit.ly/3Y2TBXB
Two of the most talked about topics in data management today are Data Fabric and Data Mesh. However, there is a lot of confusion around them. Are they alternative options, or are they complementary? Many organizations are struggling with these questions when trying to modernize their data architecture. Mike Ferguson, Managing Director of Intelligent Business Strategies, will help clear up the confusion by looking at what Data Fabric and Data Mesh are and how they can best be used to help shorten time to value in companies seeking to become data-driven enterprises.
Mike will help address many of your questions, including:
- What is a Data Fabric and Data Mesh, and the business value of each?
- What are the key concepts and capabilities of each, and what do they make possible?
- The implications of decentralizing data engineering, and how do you co-ordinate data product development?
- How can a Data Fabric help in building a Data Mesh?
Following Mike's presentation, we will be joined by Kevin Bohan of Denodo, who will discuss the foundational capabilities you should be putting in place if you are planning on adopting a Data Mesh strategy.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
The document provides an overview of big data analytics. It defines big data as high-volume, high-velocity, and high-variety information assets that require cost-effective and innovative forms of processing for insights and decision making. Big data is characterized by the 3Vs - volume, velocity, and variety. The emergence of big data is driven by the massive amount of data now being generated and stored, availability of open source tools, and commodity hardware. The course will cover Apache Hadoop, Apache Spark, streaming analytics, visualization, linked data analysis, and big data systems and AI solutions.
Slides from a recent Big Data Warehousing Meetup titled, Big Data Analytics with Microsoft.
See Power Pivot/ Power Query/ Power View/ Power Maps and Azure Machine Learning be used to analyze Big Data.
One challenge of dealing with Big Data project is to acquire both structured and instructed information in order to find the right correlation. During the event, we explained all the steps to build your model and enhance your existing data through Microsoft's Power BI.
We had an in-depth discussion about the innovations built into the latest stack of Microsoft Business Intelligence, and practical tips from Technology Specialist’s from Microsoft.
The session also featured demos to help you see the technology as an end-to-end solution.
For more information, visit www.casertaconcepts.com
Watch full webinar here: https://bit.ly/3mdj9i7
You will often hear that "data is the new gold"? In this context, data management is one of the areas that has received more attention from the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
In this webinar, we will discuss the technology trends that will drive the enterprise data strategies in the years to come. Don't miss it if you want to keep yourself informed about how to convert your data to strategic assets in order to complete the data-driven transformation in your company.
Watch this on-demand webinar as we cover:
- The most interesting trends in data management
- How to build a data fabric architecture?
- How to manage your data integration strategy in the new hybrid world
- Our predictions on how those trends will change the data management world
- How can companies monetize the data through data-as-a-service infrastructure?
- What is the role of voice computing in future data analytic
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
1. IATECH MÁLAGA
BIGOWL: Using Semantics to
Develop Big Data Analytics
Solutions
José Manuel García Nieto
jnieto@lcc.uma.es
2. IATECH MÁLAGA
Outline
• Introduction
• Concepts and Background
• Current practices in Big Data analytics
• Semantic modelling
• Overall approach
• Validation: Case studies
• Discussions
• Conclusions
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions2
3. IATECH MÁLAGA
Introduction
• Motivation
• Gartner’s report: An emerging challenge in Big
Data is to construct data-driven intelligent
applications that capture and inject domain
knowledge in the analytical processes, including
context and using a standardized format
• Context refers to all the relevant (meta)-information
to support the analysis and to help interpreting its
results
• This will facilitate the integration (in a standardized
way) with third parties’ data, algorithms, business
intelligence (BI) and visualization services
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions3
4. IATECH MÁLAGA
Introduction
• Motivation
• The use of semantics as contextual information will
enhance the analytical power of the algorithms, as well as the
reuse of single components in data analytics workflows
(Ristoski & Paulheim, 2016)
• The development of ways to make the domain knowledge
explicit and usable is needed to improve the data processing
and analysis tasks
• The Semantic Web technology can be used to annotate not
only the knowledge domain of the data, but also the analytics’
meta-data (Keet, Ławrynowicz et al., 2015)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions4
8. IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions8
9. IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions9
https://www.big-data-europe.eu/ http://www.bigdataocean.eu/
http://www.semagrow.eu/
10. IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
• In Academics we
aim at going one
step further
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions10
11. IATECH MÁLAGA
Introduction
• Motivation
• In Academics we aim at going one step further
• The Semantic Web technologies can be used to annotate not only the knowledge
domain of the data, but also the analytics’ meta-data, including: algorithms’ parameters,
input variables, tuning experiences, expected behaviours and taxonomies
• This will facilitate the reuse and composition of Big Data
analytics in a proper manner
• As well as to enhance the quality of consumed and produced
data
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions11
12. IATECH MÁLAGA
Introduction
• Hypothesis:
The semantic annotation of Big Data sources, components and algorithms can
acts as a link to capture and incorporate the domain knowledge to guide and
enhance the analytical processes
• In addition, the semantic annotation can provide the background for
reasoning methods based on axiomatic and rule logic recommendations
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions12
13. IATECH MÁLAGA
Introduction
• Proposal:
• Semantic model: ontology-driven approach to support knowledge
management in Big Data analytics workflows
• The proposed ontology is called BIGOWL (BIG data analytics OWL 2 ontology),
which acts as a formal schema for the representation and consolidation of
knowledge in Big Data analytics
• Knowledge incorporation is in turn beneficial for an efficient algorithmic performance, by
taking part in operator’s design, parameter selection, human interactive and decision-
making strategies
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions13
14. IATECH MÁLAGA
Concepts and Background
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions14
15. IATECH MÁLAGA
Concepts and Background
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?
The Semantic Web, Web 3.0, the Linked Data Web, the Web of Data…whatever
you call it, the Semantic Web represents the next major evolution in connecting
information
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions15
16. IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• The word semantic itself implies meaning or understanding
• Semantic Web is concerned with the
meaning and not the structure of data
(such as, relational databases or
the World Wide Web itself)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions16
17. IATECH MÁLAGA
Concepts and Background
• What Standards Apply to the Semantic Web?
• Mainly 4 technical standards:
• An Ontology provides a formal representation of the real world
• It defines an explicit description of concepts in a domain of discourse
(classes or concepts), properties of each concept describing various
features and attributes of the concept (properties) and restrictions on
properties
• RDF (Resource Description Framework): The data modelling language for the
Semantic Web. All Semantic Web information is stored and represented in the
RDF
• SPARQL (SPARQL Protocol and RDF Query Language): The query language of the
Semantic Web. It is specifically designed to query data across various systems
• OWL (Web Ontology Language): The schema language, or knowledge
representation (KR) language, of the Semantic Web
• OWL enables you to define concepts composably so that these concepts
can be reused as much and as often as possible
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions17
18. IATECH MÁLAGA
Concepts and Background
• What Standards Apply to the Semantic
Web?
• Imagine two relational tables of two
different databases: movies and cinema
rooms
• Imagine a program can automatically query
your Web site and any other site that has
movie scheduling information in order to
show a complete view in one place
The goal of Linked Data is to publish structured
data in such a way that it can be easily
consumed and combined with other Linked
Data
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions18
19. IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• Linked Data is the Semantic Web realized via four best practice principles
• Use URIs as names for things.
• An example of a URI is any URL
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the standards such as
RDF* and SPARQL
• Include links to other URIs so that they can discover more things
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions19
20. IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web”
Different?
• Once all the rows of our tables
have been uniquely identified,
made dereferenceable through
HTTP, and described with RDF, the
last step is providing links
between different rows across
different tables
• The main aim here is to make
explicit those links that were
implicit before shifting to the
Linked Data approach. In our
example, movies would be linked
to the theatres in which they are
playing
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions20
21. IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• Once our tables have been so published, the Linked Data rules do
their magic: people across the Web can start referencing and
consuming the data in our rows easily
• If we go further and link from our movies to external popular data
sets such Wikipedia and IMDB then we make it even easier for
people and computers to consume our data and combine it with
other data
• This provides our data with context
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions21
22. IATECH MÁLAGA
Current practices in Big Data analytics
• In current Big Data
technology ecosystems,
when facing a specific
data analytic task, it is
usual to support on
already existing tools
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions22
23. IATECH MÁLAGA
Current practices in Big Data analytics
• Besides technological or commercial aspects, current Big Data platforms still follow
the common procedure when facing data analytics tasks (ACM-SIGKDD, 2014), which
comprises typical steps of classical KDD:
• data collection,
• data transformation,
• data mining,
• pattern evaluation, and
• knowledge presentation (Visualization)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions23
24. IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• Ontological scheme driving the whole process of Big Data analytics
• It is the terminological box (TBox) that defines the vocabulary with concepts and properties in the
domain of Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions24
25. IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• It is the terminological box (TBox) that
defines the vocabulary with concepts and
properties (relationships) in the domain of
Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions25
26. IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• It is the terminological box (TBox)
that defines the vocabulary with
concepts and properties in the
domain of Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions26
28. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Dynamic version of the bi-objective Traveling Salesman Problem (TSP), to minimize the
“travel time” and the “distance” to cover certain routing points in a urban area
• Open Data API provided by the
New York City Department of
Transportation
• Updates traffic information
several times per minute
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions28
29. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Analyser: Multi-objective metaheuristic NSGA-II provided in jMetalSP. It which allows
parallel processing of evaluation functions in Apache Spark environment
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions29
30. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions30
31. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Ontology definition of this workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions31
32. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions32
33. IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming
processing of New York City traffic
open-data
• Semantic annotation and querying
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions33
34. IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Classification algorithm: decision tree J48
• UCI Repository
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions34
35. IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• For materialization, two different approaches have been used in this case:
• the well-known library for data mining Weka and
• the BigML SaaS API for analysis on-cloud
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions35
36. IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Ontology definition of this workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions36
37. IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Analytic workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions37
38. IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem
of Irish flower classification
• Semantic annotation and querying
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions38
39. IATECH MÁLAGA
Validation: Case studies
• Case study 3: Reasoning
• SWRL rules to perform semantic reasoning jobs mainly devoted to check correctness of
workflows, e.i., to discover those components and tasks with (non-)compatible
connectivity of inputs/outputs, execution orders, data domains, data formats, data types,
etc
• SWRL rules are then evaluated by the reasoner after classifying Big Data components in
accordance with axioms
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions39
40. IATECH MÁLAGA
Validation: Case studies
• Case study 3: Reasoning
• SWRL rules to check correctness of workflows
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions40
41. IATECH MÁLAGA
Conclusions
• Experience in case studies revealed that BIGOWL approach is useful
when integrating knowledge domain concerning a specific analytic
problem
• Consequently, the integrated knowledge is used for guiding the
design of Big Data analytics workflows, by recommending next
components to be linked, and supporting final validation
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions41
42. IATECH MÁLAGA
Research agenda
• First phase to provide automatic facilities for ontology population, hence to enrich
the semantic approach
• To generate new and heterogeneous use cases of analytics workflows that would led
us to find and solve new possible deficiencies, as well as to enrich the knowledge
base
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions42
43. IATECH MÁLAGA
BIGOWL: Using Semantics to
Develop Big Data Analytics
Solutions
José Manuel García Nieto
jnieto@lcc.uma.es