The document describes IBM's InfoSphere Stewardship Center and Data Quality Exception Console. The Stewardship Center provides a single collaborative environment for business users to define and monitor compliance with data quality policies and manage data quality issues to resolution. It addresses the needs of various governance roles through customizable interfaces. The Stewardship Center integrates with IBM BPM to manage governance and data quality processes. The Data Quality Exception Console displays exceptions identified by Information Analyzer, DataStage/QualityStage, and the Information Governance Catalog and allows users to collaborate to resolve them.
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Change Data Capture (CDC) - Most RDBMS vendors have a version of it, and most data warehouse professionals have built it in one form or another. This presentation will define CDC and its close relative changed data capture. It explains how the reasons for CDC and the destinations for the captured changes drive how to best capture change data. Different pitfalls associated with processing of change data into the respective destinations are exposed. Attendees will know when to use which CDC method, how to process the captured changes into their destinations, and be able to provide a clear rationale for their choice.
This document provides an overview of data warehousing concepts including:
- The key differences between operational systems and data warehouses in terms of design, usage, and data characteristics.
- The benefits of implementing a data warehouse for business intelligence and decision making.
- Common data warehousing architectures and approaches including top-down, bottom-up, and hybrid approaches.
- Fundamental data modeling techniques for data warehouses including entity-relationship modeling and dimensional modeling.
My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.
This document provides an introduction and overview of YARN (Yet Another Resource Negotiator), a framework for job scheduling and cluster resource management in Apache Hadoop. It discusses limitations of the "classical" MapReduce framework and how YARN addresses these through its separation of scheduling and application execution responsibilities across a ResourceManager and per-application ApplicationMasters. Key aspects of YARN's architecture like NodeManagers and containers are also introduced.
This document summarizes Talend Data Preparation, a self-service data preparation tool. It empowers business users and analysts to clean and prepare data in minutes rather than hours. The tool addresses common data issues like missing values, different formats, and extra steps needed to access and prepare data. It is designed for a variety of roles including business analysts, data scientists, and IT developers. The document outlines the product editions including a free desktop version and integrated subscription version. Use cases include self-service BI, big data discovery, and enabling agile data stewardship. Instructions are provided to download the free version and get started with data preparation.
Building Better Data Pipelines using Apache AirflowSid Anand
Apache Airflow is a platform for authoring, scheduling, and monitoring workflows or directed acyclic graphs (DAGs). It allows users to programmatically author DAGs in Python without needing to bundle many XML files. The UI provides a tree view to see DAG runs over time and Gantt charts to see performance trends. Airflow is useful for ETL pipelines, machine learning workflows, and general job scheduling. It handles task dependencies and failures, monitors performance, and enforces service level agreements. Behind the scenes, the scheduler distributes tasks from the metadata database to Celery workers via RabbitMQ.
The document is a 20 page comparison of ETL tools. It includes an introduction, descriptions of 4 ETL tools (Pentaho Kettle, Talend, Informatica PowerCenter, Inaplex Inaport), and a section comparing the tools on various criteria such as cost, ease of use, speed and data quality. The comparison chart suggests Informatica PowerCenter is the fastest and most full-featured tool while open source options like Pentaho Kettle and Talend offer lower costs but require more manual configuration.
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Change Data Capture (CDC) - Most RDBMS vendors have a version of it, and most data warehouse professionals have built it in one form or another. This presentation will define CDC and its close relative changed data capture. It explains how the reasons for CDC and the destinations for the captured changes drive how to best capture change data. Different pitfalls associated with processing of change data into the respective destinations are exposed. Attendees will know when to use which CDC method, how to process the captured changes into their destinations, and be able to provide a clear rationale for their choice.
This document provides an overview of data warehousing concepts including:
- The key differences between operational systems and data warehouses in terms of design, usage, and data characteristics.
- The benefits of implementing a data warehouse for business intelligence and decision making.
- Common data warehousing architectures and approaches including top-down, bottom-up, and hybrid approaches.
- Fundamental data modeling techniques for data warehouses including entity-relationship modeling and dimensional modeling.
My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.
This document provides an introduction and overview of YARN (Yet Another Resource Negotiator), a framework for job scheduling and cluster resource management in Apache Hadoop. It discusses limitations of the "classical" MapReduce framework and how YARN addresses these through its separation of scheduling and application execution responsibilities across a ResourceManager and per-application ApplicationMasters. Key aspects of YARN's architecture like NodeManagers and containers are also introduced.
This document summarizes Talend Data Preparation, a self-service data preparation tool. It empowers business users and analysts to clean and prepare data in minutes rather than hours. The tool addresses common data issues like missing values, different formats, and extra steps needed to access and prepare data. It is designed for a variety of roles including business analysts, data scientists, and IT developers. The document outlines the product editions including a free desktop version and integrated subscription version. Use cases include self-service BI, big data discovery, and enabling agile data stewardship. Instructions are provided to download the free version and get started with data preparation.
Building Better Data Pipelines using Apache AirflowSid Anand
Apache Airflow is a platform for authoring, scheduling, and monitoring workflows or directed acyclic graphs (DAGs). It allows users to programmatically author DAGs in Python without needing to bundle many XML files. The UI provides a tree view to see DAG runs over time and Gantt charts to see performance trends. Airflow is useful for ETL pipelines, machine learning workflows, and general job scheduling. It handles task dependencies and failures, monitors performance, and enforces service level agreements. Behind the scenes, the scheduler distributes tasks from the metadata database to Celery workers via RabbitMQ.
The document is a 20 page comparison of ETL tools. It includes an introduction, descriptions of 4 ETL tools (Pentaho Kettle, Talend, Informatica PowerCenter, Inaplex Inaport), and a section comparing the tools on various criteria such as cost, ease of use, speed and data quality. The comparison chart suggests Informatica PowerCenter is the fastest and most full-featured tool while open source options like Pentaho Kettle and Talend offer lower costs but require more manual configuration.
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
Data lakes are central to modern data architectures. They can store all types of raw data, create refined datasets for various use cases, and provide shorter time-to-insight with proper management and governance. The document discusses how a data lake reference architecture can include landing, raw, refined, and trusted zones to enable analytics while governing data. It also outlines considerations for implementing a scalable, secure, and governed data lake platform.
Guidelines for moving from Oracle Forms to Oracle ADF and SOASteven Davelaar
The document provides guidelines for moving from Oracle Forms to Oracle Application Development Framework (ADF) and Service-Oriented Architecture (SOA). It outlines some common customer quotes and pitfalls when migrating from Forms. It discusses defining a modernization strategy by analyzing the current Forms system and defining where the organization wants to go. Finally, it presents some common modernization options like moving business logic to the database, upgrading Forms, adding extensions, and migrating functionality.
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.
Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Many vendors have provided interfaces between SQL systems and Hadoop but have not been able to semantically integrate these technologies while Hive, Pig and SQL processing islands proliferate. This session will discuss how Teradata is working with Hortonworks to optimize the use of Hadoop within the Teradata Analytical Ecosystem to ingest, store, and refine new data types, as well as exciting new developments to bridge the gap between Hadoop and SQL to unlock deeper insights from data in Hadoop. The use of Teradata Aster as a tightly integrated SQL-MapReduce® Discovery Platform for Hadoop environments will also be discussed.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Kubernetes monitoring using prometheus stackJuraj Hantak
Ondrej Sika is a freelance DevOps architect and consultant who specializes in tools like Git, Docker, Kubernetes, Terraform, Ansible, and the Prometheus monitoring stack. The document discusses Prometheus, Alertmanager, and Grafana which make up the Prometheus monitoring stack. It provides examples of configuring services, rules, and dashboards to monitor applications running on Kubernetes.
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
Check out what Change Data Capture (CDC) is and why it is becoming ever more important. Slides also include useful tips on how to design your CDC implementation.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
Simon Aubury gave a presentation on using ksqlDB for various enterprise workloads. He discussed four use cases: 1) streaming ETL to analyze web traffic data, 2) data enrichment to identify customers impacted by a storm, 3) measurement and audit to verify new system loads, and 4) data transformation to quickly fix data issues. For each use case, he described how to develop pipelines and applications in ksqlDB to address the business needs in a scalable and failure-resistant manner. Overall, he advocated for understanding when ksqlDB is appropriate to use and planning systems accordingly.
In this presentation from EA Connect Days 2018 in Bonn the LeanIX Microservices Integration is examined. Find out what the benefits are and how to set it up.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
This document discusses Kubeflow, an end-to-end machine learning platform for Kubernetes. It covers various Kubeflow components like Jupyter notebooks, distributed training operators, hyperparameter tuning with Katib, model serving with KFServing, and orchestrating the full ML lifecycle with Kubeflow Pipelines. It also talks about IBM's contributions to Kubeflow and shows how Watson AI Pipelines can productize Kubeflow Pipelines using Tekton.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...NETWAYS
The growth of observability trends and Kubernetes adoption generates more demanding requirements for monitoring systems. Volumes of time series data increase exponentially, and old solutions just can’t keep up with the pace. The talk will cover how and why we created a new open source time series database from scratch. Which architectural decisions, which trade-offs we had to take in order to match the new expectations and handle 100 million metrics per second with VictoriaMetrics. The talk will be interesting for software engineers and DevOps familiar with observability and modern monitoring systems, or for those who’re interested in building scalable high performant databases for time series.
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
Memory leaks are not always simple or easy to find. Heap dumps from production systems are often gigantic (4+ gigs) with millions of objects in memory. Simple spot checking with traditional tools is woefully inadequate in these situations, especially with real data. Leaks can be entire object graphs with enormous amounts of noise. This session will show you how to build custom tools using the Apache NetBeans Profiler/Heapwalker APIs. Using these APIs, you can read and analyze Java heaps programmatically to ask really hard questions. This gives you the power to analyze complex object graphs with tens of thousands of objects in seconds.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
3 Things to Learn About:
*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases
Data Stewardship is an approach to Data Governance that formalises accountability for managing information resources on behalf of others and for the best interests of the organization
Data Stewardship consists of the people, organisation, and processes to ensure that the appropriately designated stewards are responsible for the governed data.
IBM InfoSphere Information Analyzer is a tool used for data profiling, data quality assessment, analysis and monitoring. It has capabilities for column analysis, primary key analysis, foreign key analysis, and cross-domain analysis. It provides data quality assessment, monitoring and rule design. Features include advanced analysis and monitoring, integrated rules analysis, and support for heterogeneous data. It helps users understand data structure, relationships and quality.
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
Data lakes are central to modern data architectures. They can store all types of raw data, create refined datasets for various use cases, and provide shorter time-to-insight with proper management and governance. The document discusses how a data lake reference architecture can include landing, raw, refined, and trusted zones to enable analytics while governing data. It also outlines considerations for implementing a scalable, secure, and governed data lake platform.
Guidelines for moving from Oracle Forms to Oracle ADF and SOASteven Davelaar
The document provides guidelines for moving from Oracle Forms to Oracle Application Development Framework (ADF) and Service-Oriented Architecture (SOA). It outlines some common customer quotes and pitfalls when migrating from Forms. It discusses defining a modernization strategy by analyzing the current Forms system and defining where the organization wants to go. Finally, it presents some common modernization options like moving business logic to the database, upgrading Forms, adding extensions, and migrating functionality.
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.
Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Many vendors have provided interfaces between SQL systems and Hadoop but have not been able to semantically integrate these technologies while Hive, Pig and SQL processing islands proliferate. This session will discuss how Teradata is working with Hortonworks to optimize the use of Hadoop within the Teradata Analytical Ecosystem to ingest, store, and refine new data types, as well as exciting new developments to bridge the gap between Hadoop and SQL to unlock deeper insights from data in Hadoop. The use of Teradata Aster as a tightly integrated SQL-MapReduce® Discovery Platform for Hadoop environments will also be discussed.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Kubernetes monitoring using prometheus stackJuraj Hantak
Ondrej Sika is a freelance DevOps architect and consultant who specializes in tools like Git, Docker, Kubernetes, Terraform, Ansible, and the Prometheus monitoring stack. The document discusses Prometheus, Alertmanager, and Grafana which make up the Prometheus monitoring stack. It provides examples of configuring services, rules, and dashboards to monitor applications running on Kubernetes.
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
Check out what Change Data Capture (CDC) is and why it is becoming ever more important. Slides also include useful tips on how to design your CDC implementation.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
Simon Aubury gave a presentation on using ksqlDB for various enterprise workloads. He discussed four use cases: 1) streaming ETL to analyze web traffic data, 2) data enrichment to identify customers impacted by a storm, 3) measurement and audit to verify new system loads, and 4) data transformation to quickly fix data issues. For each use case, he described how to develop pipelines and applications in ksqlDB to address the business needs in a scalable and failure-resistant manner. Overall, he advocated for understanding when ksqlDB is appropriate to use and planning systems accordingly.
In this presentation from EA Connect Days 2018 in Bonn the LeanIX Microservices Integration is examined. Find out what the benefits are and how to set it up.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
This document discusses Kubeflow, an end-to-end machine learning platform for Kubernetes. It covers various Kubeflow components like Jupyter notebooks, distributed training operators, hyperparameter tuning with Katib, model serving with KFServing, and orchestrating the full ML lifecycle with Kubeflow Pipelines. It also talks about IBM's contributions to Kubeflow and shows how Watson AI Pipelines can productize Kubeflow Pipelines using Tekton.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...NETWAYS
The growth of observability trends and Kubernetes adoption generates more demanding requirements for monitoring systems. Volumes of time series data increase exponentially, and old solutions just can’t keep up with the pace. The talk will cover how and why we created a new open source time series database from scratch. Which architectural decisions, which trade-offs we had to take in order to match the new expectations and handle 100 million metrics per second with VictoriaMetrics. The talk will be interesting for software engineers and DevOps familiar with observability and modern monitoring systems, or for those who’re interested in building scalable high performant databases for time series.
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
Memory leaks are not always simple or easy to find. Heap dumps from production systems are often gigantic (4+ gigs) with millions of objects in memory. Simple spot checking with traditional tools is woefully inadequate in these situations, especially with real data. Leaks can be entire object graphs with enormous amounts of noise. This session will show you how to build custom tools using the Apache NetBeans Profiler/Heapwalker APIs. Using these APIs, you can read and analyze Java heaps programmatically to ask really hard questions. This gives you the power to analyze complex object graphs with tens of thousands of objects in seconds.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
3 Things to Learn About:
*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases
Data Stewardship is an approach to Data Governance that formalises accountability for managing information resources on behalf of others and for the best interests of the organization
Data Stewardship consists of the people, organisation, and processes to ensure that the appropriately designated stewards are responsible for the governed data.
IBM InfoSphere Information Analyzer is a tool used for data profiling, data quality assessment, analysis and monitoring. It has capabilities for column analysis, primary key analysis, foreign key analysis, and cross-domain analysis. It provides data quality assessment, monitoring and rule design. Features include advanced analysis and monitoring, integrated rules analysis, and support for heterogeneous data. It helps users understand data structure, relationships and quality.
This document discusses IBM Information Server and its Information Analyzer tool. Information Analyzer allows users to understand their data by automating data discovery and profiling. It analyzes data attributes, data types, lengths, and relationships to understand an organization's data structure and quality issues. The tool reduces the time and resources required to analyze data sources for projects involving data migration, integration or quality improvement. It also enables data sharing and consistency across IBM Information Server products. Examples are provided showing how Information Analyzer helped companies address data issues and realize cost savings and productivity gains.
This document discusses data governance capabilities in IBM Streams version 4.1. It introduces integration with the IBM Information Governance Catalog for governing Streams assets and runtime activities. Key points include: Streams bundles and assets can be imported into the catalog; governance is enabled at the instance level; assets are discoverable in Streams Explorer and can be dragged into applications; and lineage and data flow can be viewed from catalog queries and reports. Future enhancements may include supporting additional Streams operators and governing additional data.
The document outlines the installation steps and notes for IBM Cognos Analytics (Cognos 11). It describes the three installation types - Ready to Run, Expand, and Custom. Ready to Run provides a full pre-configured version for quick setup while Custom allows flexibility to choose components. It also notes post-installation configuration tips like changing the JDBC driver location and data file path.
This document provides an overview of new features in IBM InfoSphere MDM version 11.3, including: improved integration with other IBM offerings like BigInsights; enhanced Salesforce integration; new healthcare, clinical data, and application licensing capabilities; improved DataStage integration; and core MDM enhancements. Key new features include probabilistic matching for BigInsights, Salesforce integration improvements, healthcare provider data warehousing enables, clinical data services, and application-based licensing options.
Présentation IBM InfoSphere Information Server 11.3IBMInfoSphereUGFR
This document summarizes new features in Information Server v11.3, including enhanced data integration, governance, and quality capabilities. Key updates include improved performance, a unified installer, expanded connectivity, and deeper integration across the information platform to accelerate value. A shared version number indicates IBM's commitment to a cohesive user experience for solving business challenges.
IBM Cognos - IBM informations-integration för IBM Cognos användareIBM Sverige
Hur kan användare av IBM Cognos analys- och rapporteringsfunktioner känna 100% tillförsikt till den information de analyserar? De måste kunna se och få förklaringar till vad informationen betyder, var den kommer ifrån och vilken status den har. Lösningen på denna typ av krav, och fler därtill, är IBM InfoSphere Information Server, som är marknadens mest kompletta plattform för informationsintegration. Denna presentation hölls på IBM Cognos Performance 2010 av Mikael Sjöstedt, InfoSphere Specialist, IBM
IBM InfoSphere Data Architect 9.1 - Francis ArnaudièsIBMInfoSphereUGFR
The document discusses IBM InfoSphere Data Architect, a tool for modeling, relating, and standardizing diverse data assets. It can design and manage enterprise data models, enforce standards, leverage industry data models, and optimize existing investments. The tool is based on the Eclipse platform and allows various users like data architects, database developers, and administrators to be more productive. It provides logical, physical, and dimensional modeling capabilities as well as tools to define and enforce standards to increase quality and governance.
Building an effective data stewardship org 2014blacng
This document discusses building an effective data stewardship organization at Stanford University. It outlines key factors for effective stewardship including participation, coordination, and resources. Some challenges are over-dependence on central resources, managing complex metadata ownership, and lack of broad engagement. Solutions proposed include carefully scoping initiatives, rewarding engagement, demonstrating progress through metrics, supplementing with side projects, and upgrading tools. The overall strategies are to start with available technology, embrace opportunities for expansion, and increase engagement.
1. The document discusses tips and tools for data stewardship, including planning for data management, best practices for data collection and organization, documenting workflows, creating metadata, and sharing data.
2. It emphasizes writing a data management plan, keeping raw data separate and secure, using version control and backups, and revisiting plans periodically.
3. The document encourages learning skills for data management, using resources like libraries and repositories, and embracing changes that support more open and reproducible science.
Install and config steps on the server side of Cognos Analytics 11. View the webinar video recording and download this deck: http://www.senturus.com/resources/installing-cognos-analytics-v11/.
Topics covered include: new installation options, new configuration options, gateway, changes to server side file locations, a new fix pack methodology. Our Cognos installation and upgrade expert, Todd Schuman, describes the installation and configuration steps on the server side of CA11. Among the highlights: 1) New installation options, 2) New configuration options, 3) Optional Gateway (not really optional), 4) Changes to server side file locations and 5) New fix pack methodology.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
Installation and Setup for IBM InfoSphere Streams V4.0lisanl
Laurie Williams is the Installation component lead on the InfoSphere Streams developement team. Her presentation describes the installation and setup of IBM InfoSphere Streams V4.0 in a multi-host environment.
View related presentations and recordings from the Streams V4.0 Developers Conference at:
https://developer.ibm.com/answers/questions/183353/ibm-infosphere-streams-40-developers-conference-on.html?smartspace=streamsdev
A Presentation on Data Stewardship & Data Advocacy - the Benefits and Advantages of Implementing a Data Strategy for Businesses originally presented to the Directorial Team at Business Link North West and the North West Development Agency
Agencies such as the NSF and NIH require data management plans as part of research proposals and the Office of Science and Technology Policy (OSTP) is requiring federal agencies to develop plans to increase public access to results of federally funded scientific research. These slides explore sustainable data sharing models, including models for sharing restricted-use data. Demos of these models and tips for accessing public data access services are provided as well as resources for creating data management plans for grant applications.
Business Semantics for Data Governance and StewardshipPieter De Leenheer
Data quality and regulations are perpetual drivers for Data Governance and Stewardship solutions that systematically monitor the execution of data policy. And yet, there is a long road ahead to achieve Trust in Data. It is still a relatively unknown topic or comes with trauma from past failed attempts; there is no political framework with executive champions, leading to reactive rather than proactive behavior, and software support is marginal.
Data Governance and Stewardship requires automation of business semantics management at its nucleus, in order to achieve a wide adoption and confluence of Data Trust between business and IT communities in the organization.
In this lecture, we start by reviewing 'C' in ICT and reflect on the dilemma: what is the most important quality of data: truth or trust? We review the wide spectrum of business semantics. We visit the different phases of data pain as a company grows, and we map their situation on this spectrum of semantics.
Next, we introduce the principles and framework for business semantics management to support data governance and stewardship focusing on the structural (what), processual (how) and organizational (who) components. We illustrate with stories from the field.
In that session we will discuss about Data Governance, mainly around that fantastic platform Power BI (but also around on-prem concerns).
How to avoid dataset-hell ? What are the best practices for sharing queries ? Who is the famous Data Steward and what is its role in a department or in the whole company ? How do you choose the right person ?
Keywords : Power Query, Data Management Gateway, Power BI Admin Center, Datastewardship, SharePoint 2013, eDiscovery
Level 200
Virtual Data Steward: Data Management 3.0CrowdFlower
Every company that is serious about data governance needs data stewards. Data stewards connect business information requirements and processes with information technology capabilities. This function is essential to bridging data management policies and standards to day-to-day operational practices.
The document outlines a new data analytics unit with a 3D framework for data governance and a lambda architecture. It includes replacing old mainframes and Netezza appliances with new mainframes, implementing an MFT for external data sources, and using Flume, SAS, and big data apps for ETL, analytics, and monitoring in a hybrid cloud environment.
Db2 update day 2015 managing db2 with ibm db2 tools svenn aagePeter Schouboe
This document discusses JN Data's consolidation of DB2 administration tools. It summarizes:
1) JN Data consolidated its DBA tools from 3 vendors to 1 to reduce costs, while maintaining service levels and supporting new DB2 releases.
2) The consolidation process took 12 months and involved transitioning existing homegrown solutions that built on the prior tools to use the new vendor's product instead.
3) JN Data analyzed vendors' abilities to support current and future DB2 features, integrate with existing solutions, and perform required tasks to choose a replacement product.
Unlock The Value Of Your Microsoft and SAP InvestmentsSAP Technology
This document discusses SAP Gateway for Microsoft, which enables easy creation of solutions so SAP data can be securely consumed and extended throughout Microsoft technologies for on premise and on demand deployment. It provides sample business use cases across different industries and roles, demonstrating how SAP Gateway for Microsoft can integrate SAP and Microsoft applications to improve productivity and business processes. The document also outlines the product roadmap and future directions for tighter integration between SAP and Microsoft clouds and platforms.
This training covers SAP Simple Finance, which provides an overview of the new SAP S/4HANA Simple Finance functionality and architecture. Key points include:
1) Simple Finance uses a universal journal to eliminate redundant data and provide a single source of truth, reducing database footprint and reconciliation efforts.
2) New features include cash and liquidity management, asset accounting, and integrated business planning.
3) Functionality is impacted through changes to transactions, tables, and configurations. The new NetWeaver Business Client is used for some configurations.
4) Training includes exercises for finance, acquire-to-retire, treasury/banking, and controlling processes.
automating_budget_books_board_packets_and_other_essential_narrative_reportsCharles Wilson
The document discusses IBM's narrative reporting solution. It begins by outlining the problems with current manual reporting processes, such as being error-prone and time-consuming. The solution aims to automate reporting, allow simultaneous authoring with workflow tracking, and connect reports to various data sources. It then provides an overview of the IBM Cognos Disclosure Management software, which facilitates interactive report production and management. Finally, a case study example is given of an Australian food and beverage company that streamlined its financial reporting process and saved 12 days of manual work per cycle using the IBM solution.
This document discusses considerations for populating a CMDB from automated discovery data. It covers data enrichment from both user-managed and electronically discovered sources. Normalization and reconciliation processes are explained to standardize data from various tools. Performance tuning tips are provided, such as indexing databases and reducing network latency during synchronization. The benefits of fully integrating discovery data into the CMDB are described as enhanced usability and ability to relate ITSM requests to configuration items.
In today's businesses, an application going down can mean millions of dollars in lost revenue. Learn how to optimize the performance of your enterprise applications powered by MongoDB with IBM Application Performance Management (APM). IBM APM will give you full visibility into your application stack and infrastructure, track every transaction going through it, and help you diagnose problems in mere minutes. With built-in analytics to predict outages before they occur and integration directly into MMS, IBM APM is a must-have solution to keep your business-critical applications up and your revenue flowing.
This long presentation from IBM introduces updates to the OMEGAMON Performance Management Suite and other IBM monitoring products:
- The updated suites and products provide improved problem identification and resolution capabilities as well as reduced costs, increased flexibility, and lower total cost of ownership.
- Specific updates include new versions of OMEGAMON for IMS, DB2, Mainframe Networks, and Dashboards, as well as inclusion of log analysis capabilities.
- The suites integrate monitoring across the z/OS platform including databases, storage, networks, applications servers, and the operating system itself.
This document discusses an agenda for a DB2 Update Day event from March 23-27, 2015. It covers topics like DB2 trends for developers, archiving DB2 data, managing DB2 with Admin and Compare tools, the DB2 11 update for developers, native stored procedures, and analytics using QMF and extensions. It also provides more detail on the evolution of Admin and Compare tools, their key functions, and some lesser known functions.
The document discusses cloud-based self-service data discovery tools for IBM Cognos TM1 and IBM mainframes. It describes RESTful APIs and OData standards that allow for uniform data modeling and operations. Rocket Software's Rocket Discover product is presented as enabling end-to-end data discovery across disparate data sources through intuitive dashboard creation and sharing capabilities.
ANIn Chennai April 2024 |Agile Engineering: Modernizing Legacy Systems by Ana...AgileNetwork
Agile Network India - Chennai
Title: Agile Engineering: Modernizing Legacy Systems by Ananth Venugopal
Date: 27th April 2024
Hosted by: ClearVue Solutions Pvt. Ltd
This document summarizes NetApp's journey implementing self-service analytics. It began in 2009 by building an enterprise data warehouse and BI platform, which enabled a single source of truth but did not support discovery or self-service. In 2013, NetApp deployed Tableau and built a tier 2 data warehouse to enable self-service analytics with data mashing and faster turnaround. Today NetApp uses a dual environment with a top-down traditional BI approach for enterprise reporting and a bottom-up self-service model enabling departments to answer new questions quickly. The key is establishing governance over the self-service model through community involvement and processes for content certification, data governance, and publishing guidelines.
Placement of BPM runtime components in an SOA environmentKim Clark
The service oriented architecture (SOA) reference architecture is intentionally simplistic at a high level but it holds some surprises when you look closely at how components really interact. This is especially true in relation to the placement of business process management (BPM) componentry. We discuss the most common design questions including: Is BPM a consumer or provider of services? To what extent should a user interface, be decoupled from the BPM runtime? How do we retain agility in BPM while adhering to the architectural separation of SOA? These subtleties are critical when designing solutions to reap benefits of both SOA and BPM simultaneously.
DevOps & Continuous Test for IIB and IBM MQStuart Feasey
This document discusses the benefits of continuous testing and service virtualization. It notes that continuous testing helps enable agile practices across the development lifecycle by allowing teams to test earlier with greater coverage at lower cost. It also discusses how service virtualization can help test integration points without requiring real services, thus speeding up testing. The document provides an example of how IBM products like Rational Integration Tester and Rational Test Virtualization Server can be used to continuously test applications and their interactions with virtualized services as part of the development and deployment process.
BPM for agile development & minimizing SAP customizationLogan Vadivelu
IBM provides expertise in using business process management (BPM) to develop business applications that minimize customization of SAP systems. BPM allows separating business logic, user interfaces, and reporting for improved flexibility. It also enables rapid, iterative application development by modeling processes, simulating changes, and deploying updates. IBM discusses using BPM to develop applications for human resources processes like hiring and expenses. Customers have used IBM BPM to streamline HR and other processes, reducing times by up to 90% while improving visibility and productivity. IBM offers workshops to assess how organizations can leverage BPM for their SAP processes.
The document describes Continental AG's planning and successful migration to IBM Connections 4.5. It discusses Continental's internal project team, the project phases and timeline, challenges of migrating a large deployment with many customizations, and the step-by-step side-by-side migration procedure used, including installation, test data migration, configuration migration, customization migration, and final data migration. Key success factors included ensuring stakeholder alignment, allowing buffer times, having subject matter experts but also personal knowledge, preparing tools, getting user feedback, taking iterative small steps with testing, and writing detailed checklists.
This document provides an overview and introduction to IBM Data Server Manager (DSM). It summarizes new features of DSM 2.1 for DB2 for z/OS, including enhancements to query tuning, configuration management, and utilities solution pack integration. The document highlights key customer requests that DSM addresses, such as simplified user experience, improved performance tuning, and centralized database administration across the enterprise. Screenshots and descriptions of DSM capabilities like database exploration, SQL development and scheduling, advisor-based query tuning, and client/server monitoring are also provided.
The webinar introduced linkTuner, a tool from Fishbowl Solutions that simulates CAD user activity across a network to benchmark and measure the performance of a PDM system. LinkTuner automates the process of testing searches, revisions, downloads and other tasks to provide empirical data on system performance with different versions of the software. It can test the same benchmark at multiple locations simultaneously or load test a system prior to going live. The results are logged with granularity to analyze performance by task, user and over multiple runs. A demo then showed how linkTuner works.
This document discusses how ADP uses IBM's DB2 Query Monitor and Optim Query Workload Tuner tools to proactively and reactively tune database performance. ADP runs a daily report to identify the top 25 CPU intensive jobs. They use Query Monitor to drill down and identify inefficient SQL statements. The Optim Query Workload Tuner advisors are then used to analyze the SQL and provide recommendations to improve performance, such as suggested indexes or query rewrites. ADP has seen success stories where implementing the recommendations led to reductions in CPU usage of up to 80%. They also conduct regular reviews of new and modified SQL using these tools.
Effective Integration of SAP MDM & BODSNavneetGiria
The document discusses the effective integration of SAP Master Data Management (MDM) and SAP Business Objects Data Services (BODS). It provides examples of how BODS can be integrated with MDM for ETL/data integration and data quality processes. The integration enables capabilities like initial data loads, incremental updates, and central master data maintenance. BODS tools help with tasks like data profiling, impact analysis, and transformation. Together, MDM and BODS provide combined data governance, consolidation, and maintenance capabilities.
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
Curtis ODell, Global Director Data Integrity at Tricentis
Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management tools—one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented.
Key Learning Objective
1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance
2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage
3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this.
4. How this approach has impact in your vertical
Similar to IBM InfoSphere Stewardship Center for iis dqec (20)
IBM leads the way in Hadoop and Spark, providing the keys to unlocking value from big data. IBM's approach enables faster adoption of these technologies through open source innovation, standards-based technologies, familiar interfaces that integrate with existing tools, and advanced analytics capabilities. IBM is committed to continued innovation in these areas and sees big data adoption as an ongoing process of increasing maturity levels.
IBM DB2 with BLU Acceleration provides several key benefits for analytical workloads:
1) It is easy to implement and maintain, requiring only the creation of tables and loading of data.
2) It provides extreme data compression and column-oriented data storage for improved performance and reduced storage needs.
3) It utilizes several techniques like data skipping, multi-processor parallelism, and Single Instruction Multiple Data (SIMD) CPU instructions to further accelerate performance.
IBM's InfoSphere Master Data Management v11 features a unified MDM solution that supports virtual, physical and hybrid implementation styles within a single instance. It provides enhanced governance capabilities, improved support for reference data management and advanced hierarchies. The release also aims to accelerate time to value through simplifying upgrades, pre-built accelerators and modularity. Additionally, v11 further integrates MDM with big data and analytics capabilities, allowing the augmentation of master data with insights from unstructured sources.
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUXIBMInfoSphereUGFR
IBM InfoSphere Streams is a platform for processing streaming data in real-time. It allows for the construction of application graphs where data continuously flows between operators. The platform can handle high data volumes and varieties, providing low-latency analysis. It includes various pre-built operators and toolkits for integration, analytics, text processing, and more. Streams supports the development of applications across multiple nodes in a cluster and can automatically distribute and parallelize processing.
This document provides an overview of IBM InfoSphere Streams, a platform for real-time analytics on big data. It discusses key features such as handling high data volumes and varieties at tremendous velocities, and the ability to perform analytics with microsecond latency. It also summarizes the types of problems that can be solved using InfoSphere Streams, including applications that require real-time processing, filtering and analysis of streaming data from various sources.
IBM InfoSphere MDM v10.1 includes several new features:
1) Advanced business rules capabilities through integration with IBM Operational Decision Manager (ODM), allowing rules to be managed through a single interface.
2) Advanced catalog management features for WebSphere Commerce, improving eCommerce operations through a tailored data model and integration framework.
3) Enhancements to the collaborative edition, including improved user interfaces, in-line editing, and new capabilities for managing business rules.
4) Reference data management hub for centralized governance of reference data through role-based access, versioning, and lifecycle management.
5) Master data governance tools including policy administration, monitoring of data quality