This document provides an agenda and overview for a workshop on introducing agile business intelligence sustainably. The workshop schedule includes sessions on what is business intelligence, introducing agile BI building blocks, building block details, BI-specific testing, and a retrospective. The presenter's slides and exercises are available online. The presenter's background and credentials in BI, agile methods, and various industry organizations are also provided.
Deliver Trusted Data by Leveraging ETL TestingCognizant
We explore how extract, transform and load (ETL) testing with SQL scripting is crucial to data validation and show how to test data on a large scale in a streamlined manner with an Informatica ETL testing tool.
How can a quality engineering and assurance consultancy keep you ahead of othersgreyaudrina
Quality Engineering and Assurance has undergone a drastic change due to the digital transformation. A business that deploys automation is in a better position to handle the competition and offer a world-class customer experience.
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
We get recommendations everyday: Facebook recommends people we should connect with; Amazon recommends products we should buy; and Google Maps recommends routes to take. What all these recommendation systems have in common are data science and modern software development.
Recommendation systems are also valuable for companies in industries as diverse as retail, telecommunications, and energy. In a recent engagement, for example, Pivotal data scientists and developers worked with a large energy company to build a machine learning-based product recommendation system to deliver intelligent and targeted product recommendations to customers to increase revenue.
In this webinar, Pivotal data scientist Ambarish Joshi will take you step-by-step through the engagement, explaining how he and his Pivotal colleagues worked with the customer to collect and analyze data, develop predictive models, and operationalize the resulting insights and surface them via APIs to customer-facing applications. In addition, you will learn how to:
- Apply agile practices to data science and analytics.
- Use test-driven development for feature engineering, model scoring, and validating scripts.
- Automate data science pipelines using pyspark scripts to generate recommendations.
- Apply a microservices-based architecture to integrate product recommendations into mobile applications and call center systems.
Presenters: Ambarish Joshi and Jeff Kelly, Pivotal
Microservices Approaches for Continuous Data IntegrationVMware Tanzu
How can businesses modernize their existing data integration flows? How can they connect a rapidly evolving number of data services? How can they capture, process, and generate new event streams? How can they leverage advances in Machine Learning to enhance real time interactions with their customers?
Join Matt Aslett, Research Director at 451 Research, and Jürgen Leschner from Pivotal for an interactive discussion about continuous data integration applications, trends, and architectures.
In this webinar you will learn:
- How traditional data integration approaches like batch ETL can be improved
- Why microservices support continuous data integration in a scalable way
- How to incorporate DevOps practices in your data integration teams
- What benefits microservices and DevOps practices bring to data integration
Presenters: Jurgen Leschner, Pivotal and Matt Aslett, Research Director, 451 Research
Deliver Trusted Data by Leveraging ETL TestingCognizant
We explore how extract, transform and load (ETL) testing with SQL scripting is crucial to data validation and show how to test data on a large scale in a streamlined manner with an Informatica ETL testing tool.
How can a quality engineering and assurance consultancy keep you ahead of othersgreyaudrina
Quality Engineering and Assurance has undergone a drastic change due to the digital transformation. A business that deploys automation is in a better position to handle the competition and offer a world-class customer experience.
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
We get recommendations everyday: Facebook recommends people we should connect with; Amazon recommends products we should buy; and Google Maps recommends routes to take. What all these recommendation systems have in common are data science and modern software development.
Recommendation systems are also valuable for companies in industries as diverse as retail, telecommunications, and energy. In a recent engagement, for example, Pivotal data scientists and developers worked with a large energy company to build a machine learning-based product recommendation system to deliver intelligent and targeted product recommendations to customers to increase revenue.
In this webinar, Pivotal data scientist Ambarish Joshi will take you step-by-step through the engagement, explaining how he and his Pivotal colleagues worked with the customer to collect and analyze data, develop predictive models, and operationalize the resulting insights and surface them via APIs to customer-facing applications. In addition, you will learn how to:
- Apply agile practices to data science and analytics.
- Use test-driven development for feature engineering, model scoring, and validating scripts.
- Automate data science pipelines using pyspark scripts to generate recommendations.
- Apply a microservices-based architecture to integrate product recommendations into mobile applications and call center systems.
Presenters: Ambarish Joshi and Jeff Kelly, Pivotal
Microservices Approaches for Continuous Data IntegrationVMware Tanzu
How can businesses modernize their existing data integration flows? How can they connect a rapidly evolving number of data services? How can they capture, process, and generate new event streams? How can they leverage advances in Machine Learning to enhance real time interactions with their customers?
Join Matt Aslett, Research Director at 451 Research, and Jürgen Leschner from Pivotal for an interactive discussion about continuous data integration applications, trends, and architectures.
In this webinar you will learn:
- How traditional data integration approaches like batch ETL can be improved
- Why microservices support continuous data integration in a scalable way
- How to incorporate DevOps practices in your data integration teams
- What benefits microservices and DevOps practices bring to data integration
Presenters: Jurgen Leschner, Pivotal and Matt Aslett, Research Director, 451 Research
Data Services and the Modern Data Ecosystem (ASEAN)Denodo
Watch full webinar here: https://bit.ly/2YdstdU
Digital Transformation has changed IT the way information services are delivered. The pace of business engagement, the rise of Digital IT (formerly known as “Shadow IT), has also increased demands on IT, especially in the area of Data Management.
Data Services exploits widely adopted interoperability standards, providing a strong framework for information exchange but also has enabled growth of robust systems of engagement that can now exploit information that was normally locked away in some internal silo with Data Virtualization.
We will discuss how a business can easily support and manage a Data Service platform, providing a more flexible approach for information sharing supporting an ever-diverse community of consumers.
Watch this on-demand webinar as we cover:
- Why Data Services are a critical part of a modern data ecosystem
- How IT teams can manage Data Services and the increasing demand by businesses
- How Digital IT can benefit from Data Services and how this can support the need for rapid prototyping allowing businesses to experiment with data and fail fast where necessary
- How a good Data Virtualization platform can encourage a culture of Data amongst business consumers (internally and externally)
6 steps to richer visualizations using alteryx for microsoft power bi updatedPhillip Reinhart
Microsoft Power BI enables analysts to deliver incredible data-driven insights
and visualizations to their organizations. As decision makers recognize the value
of visual analytics produced in Microsoft Power BI, analysts must find ways
of dealing with the increasing volumes and complexity of the data required to
get to these insights and visualizations. For Microsoft Power BI users this is a
critical and often time consuming process. A lot of time spent revolves around
blending data from multiple sources to create an actionable analytic dataset.
Hence, this forces analysts to spend many days dealing with:
• Wasted time waiting for others to get them the right data for their analysis
• Manual preparation and integration of different data sets
• A lack of advanced analytics that many decisions require
Alteryx provides the advanced data blending capabilities required to reduce
the time and effort to create the perfect dataset for a Microsoft Power BI
visualization. This cookbook shows you how you can quickly blend multiple
sources of data in order to create richer visualizations in Microsoft Power BI.
Patterns provide structure and clarity, enabling architects to establish their solutions across the enterprise. Moreover, these software patterns also help to link technology and business requirements in an effective and efficient manner. Patterns help to incorporate robust solutions for business problems due to it’s wide adoption as well as it’s reusability. In addition, patterns create a common method to communicate, document and describe solutions. This session will explain some of these patterns ranging from SOA (Service-Oriented Architecture), WOA (Web-Oriented Architecture), EDA (Event Driven Architecture), and IoT (Internet of Things)
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
Watch full webinar here: https://bit.ly/3offv7G
Presented at AI Live APAC
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this on-demand session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc.
Integrating Structure and Analytics with Unstructured DataDATAVERSITY
How can you make sense of messy data? How do you wrap structure around non-relational, flexibly structured data? With the growth in cloud technologies, how do you balance the need for flexibility and scale with the need for structure and analytics? Join us for an overview of the marketplace today and a review of the tools needed to get the job done.
During this hour, we'll cover:
- How big data is challenging the limits of traditional data management tools
- How to recognize when tools like MongoDB, Hadoop, IBM Cloudant, R Studio, IBM dashDB, CouchDB, and others are the right tools for the job.
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Skender Kollcaku
Data is one of the most important assets an organization has
because it defines each organization’s uniqueness.
Being a data-driven organization is not the final objective,
but it represents a crucial process in the innovation challenge.
Data integration will continue to remain an actual issue for complex and fast-growing companies that share datasets between vendors, partners and more and more connected customers. The need to integrate systems is not recent, but now, thanks to computational power and technology evolution, we can achieve this in real-time.
Implement Big Data Testing in Order to Successfully Generate Analytics. This Blog is ideal for software testers and anyone else who wants to understand big data testing.
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Denodo
This first session in a series of six ‘Packed Lunch’ webinars provides an overview of Data Virtualization technology, its applications and how it is adding business value to organizations around the world.
More information and FREE registrations to this webinar: http://goo.gl/z7mq2S
Landing page for the entire Packed Lunch webinar series: http://goo.gl/NATMHw
Attend & get unique insights into:
What Data Virtualization is and what sets it apart from traditional integration tools
How it both complements and leverages existing enterprise architectures
The Denodo Data Virtualization platform and its capabilities
Power BI Advanced Data Modeling Virtual WorkshopCCG
Join CCG and Microsoft for a virtual workshop, hosted by Solution Architect, Doug McClurg, to learn how to create professional, frustration-free data models that engage your customers.
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Data Services and the Modern Data Ecosystem (ASEAN)Denodo
Watch full webinar here: https://bit.ly/2YdstdU
Digital Transformation has changed IT the way information services are delivered. The pace of business engagement, the rise of Digital IT (formerly known as “Shadow IT), has also increased demands on IT, especially in the area of Data Management.
Data Services exploits widely adopted interoperability standards, providing a strong framework for information exchange but also has enabled growth of robust systems of engagement that can now exploit information that was normally locked away in some internal silo with Data Virtualization.
We will discuss how a business can easily support and manage a Data Service platform, providing a more flexible approach for information sharing supporting an ever-diverse community of consumers.
Watch this on-demand webinar as we cover:
- Why Data Services are a critical part of a modern data ecosystem
- How IT teams can manage Data Services and the increasing demand by businesses
- How Digital IT can benefit from Data Services and how this can support the need for rapid prototyping allowing businesses to experiment with data and fail fast where necessary
- How a good Data Virtualization platform can encourage a culture of Data amongst business consumers (internally and externally)
6 steps to richer visualizations using alteryx for microsoft power bi updatedPhillip Reinhart
Microsoft Power BI enables analysts to deliver incredible data-driven insights
and visualizations to their organizations. As decision makers recognize the value
of visual analytics produced in Microsoft Power BI, analysts must find ways
of dealing with the increasing volumes and complexity of the data required to
get to these insights and visualizations. For Microsoft Power BI users this is a
critical and often time consuming process. A lot of time spent revolves around
blending data from multiple sources to create an actionable analytic dataset.
Hence, this forces analysts to spend many days dealing with:
• Wasted time waiting for others to get them the right data for their analysis
• Manual preparation and integration of different data sets
• A lack of advanced analytics that many decisions require
Alteryx provides the advanced data blending capabilities required to reduce
the time and effort to create the perfect dataset for a Microsoft Power BI
visualization. This cookbook shows you how you can quickly blend multiple
sources of data in order to create richer visualizations in Microsoft Power BI.
Patterns provide structure and clarity, enabling architects to establish their solutions across the enterprise. Moreover, these software patterns also help to link technology and business requirements in an effective and efficient manner. Patterns help to incorporate robust solutions for business problems due to it’s wide adoption as well as it’s reusability. In addition, patterns create a common method to communicate, document and describe solutions. This session will explain some of these patterns ranging from SOA (Service-Oriented Architecture), WOA (Web-Oriented Architecture), EDA (Event Driven Architecture), and IoT (Internet of Things)
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
Watch full webinar here: https://bit.ly/3offv7G
Presented at AI Live APAC
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this on-demand session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc.
Integrating Structure and Analytics with Unstructured DataDATAVERSITY
How can you make sense of messy data? How do you wrap structure around non-relational, flexibly structured data? With the growth in cloud technologies, how do you balance the need for flexibility and scale with the need for structure and analytics? Join us for an overview of the marketplace today and a review of the tools needed to get the job done.
During this hour, we'll cover:
- How big data is challenging the limits of traditional data management tools
- How to recognize when tools like MongoDB, Hadoop, IBM Cloudant, R Studio, IBM dashDB, CouchDB, and others are the right tools for the job.
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Skender Kollcaku
Data is one of the most important assets an organization has
because it defines each organization’s uniqueness.
Being a data-driven organization is not the final objective,
but it represents a crucial process in the innovation challenge.
Data integration will continue to remain an actual issue for complex and fast-growing companies that share datasets between vendors, partners and more and more connected customers. The need to integrate systems is not recent, but now, thanks to computational power and technology evolution, we can achieve this in real-time.
Implement Big Data Testing in Order to Successfully Generate Analytics. This Blog is ideal for software testers and anyone else who wants to understand big data testing.
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Denodo
This first session in a series of six ‘Packed Lunch’ webinars provides an overview of Data Virtualization technology, its applications and how it is adding business value to organizations around the world.
More information and FREE registrations to this webinar: http://goo.gl/z7mq2S
Landing page for the entire Packed Lunch webinar series: http://goo.gl/NATMHw
Attend & get unique insights into:
What Data Virtualization is and what sets it apart from traditional integration tools
How it both complements and leverages existing enterprise architectures
The Denodo Data Virtualization platform and its capabilities
Power BI Advanced Data Modeling Virtual WorkshopCCG
Join CCG and Microsoft for a virtual workshop, hosted by Solution Architect, Doug McClurg, to learn how to create professional, frustration-free data models that engage your customers.
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
Two #ModernDataStack talks and one DevOps talk: https://youtu.be/4R--iLnjCmU
1. "From Data-driven Business to Business-driven Data: Hands-on #DataModelling exercise" by Jacob Frackson of Montreal Analytics
2. "Trends in the #DataEngineering Consulting Landscape" by Nadji Bessa of Infostrux Solutions
3. "Building Secure #Serverless Delivery Pipelines on #GCP" by Ugo Udokporo of Google Cloud Canada
We ran out of time for the 4th presenter, so the event will CONTINUE in March... stay tuned! Compliments of #ServerlessTO.
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...HostedbyConfluent
Van Oord, a 150 year old family owned business, build windmill parks in the sea, lay cables on sea surface, dredging, as well as infrastructure (Dike, etc) operates world-wide, often facilitating self-owned specialized vessels. A well-known prestigious project is the creation of the palm island at the coast of Dubai.
Data Management in Van Oord is still in its infancy. The current operation is based on bilateral data exchange, without an Enterprise Service Bus or mayor data warehouse infrastructure. In 2020 Van Oord started a PoC with Confluent Kafka, executing a wide range of uses cases and requirements, followed by the formal program implementing a sustainable data platform.
Data owners are publishing an information product, i.e. a set of Kafka topics to communicate change (a la CDC) and topics for sharing state of a data source (Kafka tables). The information product owner is responsible for granting access, assuring data quality, data linage and governance. The set of all information products forms the enterprise data model.
This talk outlines why Van Oord requires data governance and enterprise architecture models integrated with Confluent Kafka, and demo how an open-source based data governance tool is integrated with Confluent Kafka to fulfil these requirements.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Marc Lelijveld
In Power BI we are used to create reports and dashboards really quickly, but in most cases we forget to think about governance, development and maintenance at an enterprise wide scale.
During this session I share some best practices about applying DTAP (Development, Production, Acceptance and Production), or better known as multi-tier deployment.
By using Azure DevOps for deployment we bring back the structure and use a self-service tool in an enterprise environment. Beside deployment there is also version control and enterprise roll-out of your content in a managed structure.
In this session:
- Azure DevOps
- PowerShell
- Power BI REST API
Data Ingestion in Big Data and IoT platformsGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Example of the BI application technology comparison based on customer needs and application capabilities performed by DWApplications.
This is one of 3 deliverables in the free BI Roadmap Assessment provided by DWApplications.
- BI application technology comparison
- Current and future state assessment
- Timeline, resource and implementation plan
If you are interested in a free BI roadmap assessment
Contact: scott.mitchell@dwapplications.com
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
This session is focused on the art of application architecture, where we unravel the intricacies of creating a standard, yet dynamic application structure.
We'll explore:
Essential components of a typical application, emphasizing their roles and interactions.
Learn how to connect UiPath RPA Processes, UiPath Apps, and Data Service together to build a stronger app.
Gain insights into building more efficient, interconnected, and robust applications in the UiPath ecosystem.
Speaker:
David Kroll, Director, Product Marketing @Ashling Partners and UiPath MVP
Watch full webinar here: https://buff.ly/2XXbNB7
What started to evolve as the most agile and real-time enterprise data fabric, Data Virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
*What data virtualization really is
*How it differs from other enterprise data integration technologies
*Why data virtualization is finding enterprise wide deployment inside some of the largest organizations
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES
Agile Testing Days 2017 Introducing AgileBI Sustainably
1. My slides are / will be available for you at:
Introducing Agile Business
Intelligence Sustainably:
Implement the Right Building Blocks in
the Right Order
Raphael Branger, IT-Logix AG
Presentation: http://bit.ly/2zBpSvz
Exercises: http://bit.ly/2hvVVGF
2. Welcome & Overview of Workshop Schedule (14:25 – 14:30)
What is Business Intelligence? (14:30 – 15:10)
Introduction to the Agile BI Building Blocks (15:10 – 15:40)
Break (15:40 – 16:10)
Building Block details (16:10 – 17:10)
User Stories
BI-specific Testing
Retrospective (17:10 – 17:25)
Agenda
3
3. Raphael Branger, Senior BI Solution Architect, IT-Logix AG, Switzerland
Working in Business Intelligence & Data Warehousing since 2002
Looking at «Agile» in the context of BI since 2010
Actively contributing to the community…
http://rbranger.wordpress.com/ (English)
http://blog.it-logix.ch/author/raphael-branger/ (German)
Regular conference engagements
Follow me on Twitter: @rbranger
Member of…
TDWI www.tdwi.eu/ & https://tdwi.org
Disciplined Agile Consortium http://www.disciplinedagileconsortium.org/
Scrum Breakfast Club http://scrumbreakfast.club/
International Business Communication Standards (IBCS) Association http://www.ibcs-a.org
About me
6
5. Grab some Post-its
Per note write down one key word or sentence of what you associate with BI & DWH
Does your company use BI & DWH?
Are you yourself an end user / developer etc. working with the BI & DWH system?
Any good or bad experience with BI & DWH systems?
…
After a few minutes, we will start to collect the notes and hear each ones short explanation.
What are your associations with Business Intelligence & Data Warehousing?
6. A typical BI asset
What do we need to build and
run this little dashboard app?
9
7. Per group of three or four, take 2
empty canvas sheets.
Take the pictures and try to stick them
to the appropriate place on one of the
canvas.
Take the text blocks and try to stick
them to the appropriate place on the
second canvas.
Timebox: 10 mins
Afterwards we’ll take some time to
discuss the BI overview together.
Exercise 1 «BI Overview»
8. 11
DWH
Integration
Data
Data Mart 1
Data Mart 2
Data Mart 3
Reports
Analysis
Dashboards
BI Strategy
Organisation & Processes
Data
Technical Metadata Business Metadata
Information
Market
BI Strategy
Vision
Mission
Objectives
Partial Strategies
BIStrategy
Management
Development
Operations
Governance
Inception Construction Transition
Process Metadata
BI Application
internal
external
Source
Systems
User
Internal
Users
External
Users
Customers
IT Strategy
Customers
Business Strategy
Data Science & AI
Data
Lake
Predictions
Ad-hoc
Operations
11. Take the overview sheet with the
numbered AgileBI Building Blocks.
Take one of the available building block
detail sheets.
Study the page and think about which
building block number is yours.
Once it is your turn, stick your page
near the corresponding block on the
wall.
We’ll briefly discuss the building block.
Exercise 2 «Agile BI Building Blocks»
15
13. Implement Vertical Slices in Order to Prioritize.
Let’s have a look at two approaches regarding how to implement a BI system:
Only a vertical slicing of the implementation (aka Features) allows for ongoing (re-) priorization of
requirements.
Topic 1 Topic 1
Feature 1 Feature 2 Feature 3
10%
40%
70%
100%
Cumulative
Progress
Cumulative
Progress
50% 61% 100%
Connectivity
DWH
Data Mart
BI App
5%
25%
50%
10%
40%
70%
14. From Feature To User Story
Feature 1 User Stories
User Stories have a «a
priori» maximum duration –
e.g. 1 or 2 days. Why?
We force ourselves to a
more frequent and shorter
feedback cycle.
Short user stories are the
foundation to answer the
question if the project
progresses / «flows» as
desired or not.
RTS = Runnable & Tested
Stories are the real
progress indicator in a
project!
BI Application
DWH
Connectivity &
Infrastructure
Layer
▪ Report with monthly layout
▪ Report with weekly layout
▪ Report with variable measure selection
▪ …
▪ 1 fact table with a non-monetary measure (e.g.
quantity) + time dimension + product dimension
(without hierarchy)
▪ Additional measure
▪ Extend product dimension with a hierarchy
▪ …
▪ Setup Middleware
▪ Manual import
▪ Automated import
▪ …
BI Application Epics
DWH Epics
Connectivity &
Infrastructure Epics
15. Testing the User Story
Feature 1 User Stories
BI Application
DWH
Layer
▪ Direct within application
▪ Query in Excel
▪ Query in Excel
▪ Query in
database tool,
e.g.SQL Server Mgt. Studio
BI Application Epics
DWH Epics
Connectivity &
Infrastructure EpicsConnectivity &
Infrastructure
16. DWH
Gather together in teams of two to four
people.
Take the excercise sheet handed out.
Exercise 3 «BI User Stories»
20
FactEventParticipant
RegisterDate
EventID
ParticipantID
NoShow (Y/N)
(Count participants)
DimEvent
EventDate
Country
City
Venue Address
Location (Geo)
Max. Participants
DimDate_Register
DateValue
DimParticipant
Name
Member Category
Roundtable
Registration
System
(Web Service
or CSV export)
TDWI
Membership
System
(SQL Server)
Define at least three user stories. Remember the User
Story should be small enough to be implmented in 1
single day.
Timebox 10 minutes.
DWH
Automation
Tool
Feature 1
17. Feature (following the regular User Story schema):
As a TDWI Backoffice employee, I need to see the number of registered participants for a
Roundtable event so that I can organize the logistics for this event.
Connectivity Epic (following the FDD schema) (<action> the <result> <by|for|of|to> <object>)
Extract the event and participant data of the web based Roundtable Registration System to a CSV
file.
Connectivity User Story (following the FDD schema):
Manually export the event and participant data for all events to a CSV file.
Schedule and Save to FTP server the event and participant data for all events to a CSV file (on
the FTP server)
Download the event and participant data for all events to a local folder (on the DWH server)
Load the event and participant data for all events to a load table (1:1 copy with the DWH
Automation tool) in the DWH database.
Possible User Stories (Connectivity & Infrastructure)
21
18. Feature (following the regular User Story schema):
As a TDWI Backoffice employee, I need to see the number of registered participants for a Roundtable
event so that I can organize the logistics for this event.
DWH Epic (following the FDD schema) (<action> the <result> <by|for|of|to> <object>)
Model and load the event and participant data of the web based Roundtable Registration System to the
DWH and Data Mart.
DWH User Story (following the FDD schema):
Model and (full) load the event master data (without Location / Geo info, not historized) to DimEvent on
the DWH layer.
Model and (full) load the participant master data (without Member Category, not historized) to
DimParticipant on the DWH layer.
Model and (full) load the event registration transaction data to FactEventParticipant on the DWH layer.
Refactor the existing load implementation to allow for incremental loads.
Create and develop the data mart for FactEventParticipant, DimEvent and DimParticipant with
«Number of Participants» as its first measure.
Possible User Stories (DWH)
22
19. Feature (following the regular User Story schema):
As a TDWI Backoffice employee, I need to see the number of registered participants for a
Roundtable event so that I can organize the logistics for this event.
BI Application Epic (following the regular User Story schema)
As a TDWI Backoffice employee, I need a BI application to see the number of registered
participants for a Roundtable event so that I can organize the catering for this event.
BI Application User Story (following the regular User Story schema):
As a TDWI Backoffice employee I need to see the number of registered participants for the next
Roundtable in a selected location so that I can organize the catering for this event.
As a TDWI Backoffice employee I need to see the percentage of «No-Shows» for the past 10
roundtables in a selected location so that I can optimize the catering for upcoming events.
As a TDWI Backoffice employee, I need to be alerted if the number of participants for the next
Roundtable in any location is at 90% of the maximum capacity so that I can check if a larger room
is available.
Possible User Stories (BI Application)
23
21. Intra-System-Tests
Where do we test? (1/2)
Each system component is tested on its own.
Staging Data
Warehouse
Reports
Source System
ETL
Marts
Cubes
Semantic Layer
Testing
Testing TestingTesting Testing Testing
22. Where do we test (2/2)
An external test tool acts independant from the system and its properties (and eventually errors).
Staging Data
Warehouse
Reports
Source System
ETL
Marts
Cubes
Semantic Layer
Testing
Inter-System-Tests
23. Test Approaches
Manually in combination with checklists & forms
Classical test automation solutions to test the
GUI, performance etc.
How do we test?
Functional specific software “functions”
Start client software
Login to BI system
Edit report
Non-Functional more quality oriented
features like
Performance
Usability
(Security)
24. Frist Time vs. Regression
First Time Tests
Regression Tests
Manual Testing
Automated Testing
25. Information Products
(e.g. reports, dashboards etc.)
Test the structure based on
metadata
Test the data based on
testdata and the information
product
Test the layout by comparing
a reference layout with the
information product
Testing per Architecture Layer - Frontend
Manual
Cell based comparison
26. Information Products
(e.g. reports, dashboards etc.)
Test the structure based on
metadata
Test the data based on
testdata and the information
product
Test the layout by comparing
a reference layout with the
information product
Testing per Architecture Layer - Frontend
Manual Automated
BI vendor specific
Generic (PDF, XLS, XML…)
Cell based comparison
Optical comparison
27. Tables in the different layers
(Source, Staging, DWH, Data
Mart, …)
Test the structure based on
metadata (DB schema)
Test the data based on
comparison data and actual
values
Testing the performance
Testing per Architecture Layer - Backend
Manual Automatisiert
DWH specific toolsCell based comparison
SQL based comparison
Source: https://bigeval.com/en/data-warehouse-etl-testing/
28. Test cases…
… contain one or more test objects (e.g. a report, measure, data set)
… need one or more reference objects
... presuppose a congruent data foundation for reference and test objects, that means the data is
either stable or develops itself further synchronously.
Stable: Define a data set which isn’t changed anymore, e.g. a closed time period.
Dynamic: Comparison data is refreshed regularly.
Test Case Design
29. Alternative 1: There is a test source system on which any test cases can be simulated.
Alternative 2: Take the production source system
Alternative 3: Fictitious source data are generated in the DWH, e.g. on the stage layer.
Testing with test data – where to take it from?
30. How big is the amount of test data?
Detail
Analysis
Modelling &
ETL Code
BI
Application
Full Data
Testing
1 day – 1 week 1 day – 1 week 1 day – 1 week 1 day – 1 week
Product Owner
Define test data set Unit Testing with
test data set
Integration Testing
with full data set
Connectivity
& DWH
Stories
BI Application
stories
Feature 1
31. Testing is a crucial success factor of every BI / DWH system.
Testing should be a «built-in» part of every BI / DWH architecture.
The more tests you have, the more meaningful is test automation.
Data based testing is not exactly the same as testing «classical» GUI oriented software: Adapt where
possible, be creative where necessary.
There are BI specific testing tools.
BI specific testing: For you to take away:
33. Write down your lesson’s learned – what do you take with you? (Timebox 3 minutes)
Share lessons learned (Timebox 10 minutes)
Retrospective
37
34. References und Literature
With friendly support from:
IT-Logix Team (http://www.it-logix.ch)
BiGeval Team (http://www.bigeval.com)
Wherescape Team (http://www.wherescape.com)
Tricentis Team (http://www.tricentis.com)
GB&Smith Team (http://www.gbandsmith.com)
Scott Ambler (http://www.disciplinedagiledelivery.com)
Lawrence Corr (http://www.modelstorming.com)
Peter Stevens (https://scrumbreakfast.club)
Maturity Model Inspiration: Belshee Arlo: Agile Engineering Fluency
http://arlobelshee.github.io/AgileEngineeringFluency/Stages_of_practice_map
.html
Literature:
Branger Raphael, Bausteine für agile und nachhaltige BI,
BI Spektrum, 5. Ausgabe 2015, SIGS DATACOM
http://www.tdwi.eu/fileadmin/user_upload/zeitschriften//2015/05/brang
er_BIS_05_2015_dzer.pdf
Collier Ken, Agile Analytics, Addison-Wesley, 2012
Corr Lawrence, Stagnitto Jim: Agile Data Warehouse Design:
Collaborative Dimensional Modeling, from Whiteboard to Star
Schema, DecisionOne Press, 2011
Hughes Ralph: Agile Data Warehousing Project Management:
Business Intelligence Systems Using Scrum, Morgan Kaufmann, 2012
Ambler Scott W., Lines Mark: Disciplined Agile Delivery: A
Practitioner's Guide to Agile Software Delivery in the Enterprise, IBM
Press, 2012
Ambler Scott W., Sadalage Pramod J.: Refactoring Databases:
Evolutionary Database Design, Addison-Wesley Professional, 2006
Krawatzeck Robert, Zimmer Michael, Trahasch Stephan, Gansor
Tom: Agile BI ist in der Praxis angekommen, in: BI-SPEKTRUM
04/2014
Memorandum für Agile Business Intelligence:
http://www.tdwi.eu/wissen/agile-bi/memorandum/
Oliver Cramer, Data Warehouse Automation, 32. TDWI Roundtable in
Zürich, 2015
Agile in a nutshell: http://blog.crisp.se/2016/10/09/miakolmodin/poster-
on-agile-in-a-nutshell-with-a-spice-of-lean
35. Blogs and Webpages around Data Warehouse Automation
TDWI E-Book Data Warehouse Automation: https://cdn2.hubspot.net/hubfs/461944/downloads/Analyst_Reports/TDWI_ebook_Accelerating_Business.pdf
Barry Devlin: BI, Built to Order, On-demand: Automating data warehouse delivery: http://www.9sight.com/2015/01/wp-built-to-order/
Oliver Cramer: Prinzipien der Data Warehouse Automation und grober Marktüberblick:
http://ddvug.de/wp-content/uploads/4_Tagung_der_DDVUG_Prinzipien_der_Data_Warehouse_Automation_Handout.pdf
Eckerson Group: Data Warehouse Automation Tools: https://www.wherescape.com/media/1791/eckerson-group-dw-automation-tools-report.pdf
What is Data Warehouse Automation: https://www.wherescape.com/products-services/what-is-data-warehouse-automation/
WhereScape RED Product Information: https://www.wherescape.com/products-services/wherescape-red/
WhereScape 3D Product Information: https://www.wherescape.com/media/1590/wherescape-3d-data-sheet.pdf
39
36.
37. «Traditional projects start with requirements
and end with data.
Data Warehousing projects start with data and
end with requirements.»
Bill Inmon
Raphael Branger, Senior Solution Architect & Partner
rbranger@it-logix.ch
Follow us: @rbranger / @itlogixag
DE: http://blog.it-logix.ch/author/raphael-branger
EN: http://rbranger.wordpress.com