SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
Invited talk at MIT CSAIL, March 8 2016
Information extraction (IE), the task of extracting structured information from unstructured or semi-structured data, is increasingly important to a wide array of enterprise applications, ranging from Business Intelligence to Data-as-a-Service. Such applications drive the following main requirements for IE systems: accuracy, scalability, expressivity, transparency, and customizability.
SystemT, a declarative IE system, has been designed and developed to address these requirements. It is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative language for expressing NLP algorithms called AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. It makes IE orders of magnitude more scalable and easy to use, maintain and customize.
SystemT ships today with multiple products across 4 IBM Software Brands. Furthermore, SystemT is used in multiple ongoing research projects and being taught in universities. Our ongoing research and development efforts focus on making SystemT more usable for both technical and business users, and continuing enhancing its core functionalities based on natural language processing, machine learning, and database technology.
Using the right data model in a data martDavid Walker
A presentation describing how to choose the right data model design for your data mart. Discusses the pros and benefits of different data models with different rdbms technologies and tools
Domain Driven Data: Apache Kafka® and the Data Meshconfluent
James Gollan, Confluent, Senior Solutions Engineer
From digital banking to industry 4.0 the nature of business is changing. Increasingly businesses are becoming software. And the lifeblood of software is data. Dealing with data at the enterprise level is tough, and their have been some missteps along the way.
This session will consider the increasingly popular idea of a 'data mesh' - the problems it solves and, perhaps most importantly, how an event streaming platform forms the bedrock of this new paradigm.
Recording to be available cnfl.io/meetup-hub
https://www.meetup.com/KafkaMelbourne/events/277076626/
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
Invited talk at MIT CSAIL, March 8 2016
Information extraction (IE), the task of extracting structured information from unstructured or semi-structured data, is increasingly important to a wide array of enterprise applications, ranging from Business Intelligence to Data-as-a-Service. Such applications drive the following main requirements for IE systems: accuracy, scalability, expressivity, transparency, and customizability.
SystemT, a declarative IE system, has been designed and developed to address these requirements. It is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative language for expressing NLP algorithms called AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. It makes IE orders of magnitude more scalable and easy to use, maintain and customize.
SystemT ships today with multiple products across 4 IBM Software Brands. Furthermore, SystemT is used in multiple ongoing research projects and being taught in universities. Our ongoing research and development efforts focus on making SystemT more usable for both technical and business users, and continuing enhancing its core functionalities based on natural language processing, machine learning, and database technology.
Using the right data model in a data martDavid Walker
A presentation describing how to choose the right data model design for your data mart. Discusses the pros and benefits of different data models with different rdbms technologies and tools
Domain Driven Data: Apache Kafka® and the Data Meshconfluent
James Gollan, Confluent, Senior Solutions Engineer
From digital banking to industry 4.0 the nature of business is changing. Increasingly businesses are becoming software. And the lifeblood of software is data. Dealing with data at the enterprise level is tough, and their have been some missteps along the way.
This session will consider the increasingly popular idea of a 'data mesh' - the problems it solves and, perhaps most importantly, how an event streaming platform forms the bedrock of this new paradigm.
Recording to be available cnfl.io/meetup-hub
https://www.meetup.com/KafkaMelbourne/events/277076626/
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
Hybrid Cloud Journey - Maximizing Private and Public CloudRyan Lynn
This presentation walks through the elements of private and public cloud and how to start looking at use cases for hybrid cloud architectures. It covers benefits, statistics, trends and practical next steps for your hybrid cloud journey.
Live presentation of some of this content: https://www.youtube.com/watch?v=9_5yJr0HKw4&t=13s
Istio is a service mesh, and it's a cool new project from Google, IBM, Lyft and others. This talk describes at a high level how Istio works as a sidecar, and how it works great with Weave Cloud, which provides visualization to understand what's going on when you deploy Istio, and long-term Prometheus metrics storage with its built-in Prometheus service.
Watch full webinar here: https://bit.ly/3mdj9i7
You will often hear that "data is the new gold"? In this context, data management is one of the areas that has received more attention from the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
In this webinar, we will discuss the technology trends that will drive the enterprise data strategies in the years to come. Don't miss it if you want to keep yourself informed about how to convert your data to strategic assets in order to complete the data-driven transformation in your company.
Watch this on-demand webinar as we cover:
- The most interesting trends in data management
- How to build a data fabric architecture?
- How to manage your data integration strategy in the new hybrid world
- Our predictions on how those trends will change the data management world
- How can companies monetize the data through data-as-a-service infrastructure?
- What is the role of voice computing in future data analytic
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
본 강연에서는 금융 감독원의 클라우드 이용 가이드라인에 맞추어 바로 도입 가능한 HPC, 빅데이터, 백업, VDI 등의 업무에 대하여 간단하게 소개하고 AWS 상에서 구축하기 위한 참조 아키텍쳐와 특장점 및 고객 사례에 대해 설명해 드릴 예정입니다.
연사: 정영준 솔루션 아키텍트, 아마존 웹서비스
Building Data Pipelines with Spark and StreamSetsPat Patterson
Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.
A Kafka journey and why migrate to Confluent Cloud?confluent
Using a success story as an example, we talk about how FRSHUB with the Apache Kafka Eco-System became an optimal business intelligence platform that interprets data from multiple sources to show real-time and predictive views of the for different business units. We talk about our IAC deployment in Azure, the API management, and why we are migrating to the Confluent Cloud.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Explore the challenges of migrating existing applications to a cloud infrastructure, then present proven strategies for mitigating the risks You'll learn how to:
- Prioritize application migration
- Plan for “big-blocks”
- Assess your existing applications' ‘fit’ for cloud
- Leverage a centralized development testing platform to align your business goals with coding decisions
- Create a cloud migration policy
- Use process to mitigate the risks associated with cloud migration
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
Hybrid Cloud Journey - Maximizing Private and Public CloudRyan Lynn
This presentation walks through the elements of private and public cloud and how to start looking at use cases for hybrid cloud architectures. It covers benefits, statistics, trends and practical next steps for your hybrid cloud journey.
Live presentation of some of this content: https://www.youtube.com/watch?v=9_5yJr0HKw4&t=13s
Istio is a service mesh, and it's a cool new project from Google, IBM, Lyft and others. This talk describes at a high level how Istio works as a sidecar, and how it works great with Weave Cloud, which provides visualization to understand what's going on when you deploy Istio, and long-term Prometheus metrics storage with its built-in Prometheus service.
Watch full webinar here: https://bit.ly/3mdj9i7
You will often hear that "data is the new gold"? In this context, data management is one of the areas that has received more attention from the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
In this webinar, we will discuss the technology trends that will drive the enterprise data strategies in the years to come. Don't miss it if you want to keep yourself informed about how to convert your data to strategic assets in order to complete the data-driven transformation in your company.
Watch this on-demand webinar as we cover:
- The most interesting trends in data management
- How to build a data fabric architecture?
- How to manage your data integration strategy in the new hybrid world
- Our predictions on how those trends will change the data management world
- How can companies monetize the data through data-as-a-service infrastructure?
- What is the role of voice computing in future data analytic
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
본 강연에서는 금융 감독원의 클라우드 이용 가이드라인에 맞추어 바로 도입 가능한 HPC, 빅데이터, 백업, VDI 등의 업무에 대하여 간단하게 소개하고 AWS 상에서 구축하기 위한 참조 아키텍쳐와 특장점 및 고객 사례에 대해 설명해 드릴 예정입니다.
연사: 정영준 솔루션 아키텍트, 아마존 웹서비스
Building Data Pipelines with Spark and StreamSetsPat Patterson
Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.
A Kafka journey and why migrate to Confluent Cloud?confluent
Using a success story as an example, we talk about how FRSHUB with the Apache Kafka Eco-System became an optimal business intelligence platform that interprets data from multiple sources to show real-time and predictive views of the for different business units. We talk about our IAC deployment in Azure, the API management, and why we are migrating to the Confluent Cloud.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Explore the challenges of migrating existing applications to a cloud infrastructure, then present proven strategies for mitigating the risks You'll learn how to:
- Prioritize application migration
- Plan for “big-blocks”
- Assess your existing applications' ‘fit’ for cloud
- Leverage a centralized development testing platform to align your business goals with coding decisions
- Create a cloud migration policy
- Use process to mitigate the risks associated with cloud migration
An insight into NoSQL solutions implemented at RTV Slovenia and elsewhere, what problems we are trying to solve and an introduction to solving them with Redis.
Talk given at #wwwh @ Ljubljana, 30.1.2013 by me, Tit Petric