Bharath Nunepalli from HCA presented on how and why HCA implemented an application data archiving and purging strategy using IBM InfoSphere Optim Data Growth Solution for z/OS. HCA needed a strategy due to application usage growth, retention policies, and database maintenance tasks. Optim archive allows isolating historical data from current systems and accessing it through familiar tools. HCA achieved archiving by choosing archive paths, creating access definitions and relationships, and building and executing JCL jobs. Limitations of Optim archive include inability to directly query archived Db2 tables.
Smart Manufacturing and Industry 4.0 - Tibco PoVNicola Sandoli
Smart Manufacturing and Industry 4.0: generating new insights and operational intelligence.
Manufacturers are increasingly relying on advanced analytics to understand data, anticipate and take proactive steps to prevent costly downtime and improve operational efficiency. Collecting real-time sensor data and mashups using machine learning techniques allows you to identify hidden insights into the potential equipment failures and operational discrepancies before they happen.
Presentation that was developed for IoT DevCon in April 2017, Santa Clara California USA.
In this deck, I go though the value of data & derivative (analytics) form an economic point of value, and then connect to how we can package that value for monetization in the context of IoT. The key technology enablement is the Digital Twin, which is describe as well as platform that support this paradigm. It ends with recommendations on how to get ready and how to start using GE Digital Predix Platform.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
From Amazon to Google, top technology firms have embraced data science and machine learning to improve business outcomes. Yet AI adoption beyond these firms has been slow due to obstacles such as hiring talent, heterogeneous data, and compute infrastructure. Larger firms have built teams to tackle these issues with some success, but small- and mid-tier firms are at a distinct disadvantage. AI as a Service is a paradigm that levels the playing field and empowers businesses across the spectrum.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
Smart Manufacturing and Industry 4.0 - Tibco PoVNicola Sandoli
Smart Manufacturing and Industry 4.0: generating new insights and operational intelligence.
Manufacturers are increasingly relying on advanced analytics to understand data, anticipate and take proactive steps to prevent costly downtime and improve operational efficiency. Collecting real-time sensor data and mashups using machine learning techniques allows you to identify hidden insights into the potential equipment failures and operational discrepancies before they happen.
Presentation that was developed for IoT DevCon in April 2017, Santa Clara California USA.
In this deck, I go though the value of data & derivative (analytics) form an economic point of value, and then connect to how we can package that value for monetization in the context of IoT. The key technology enablement is the Digital Twin, which is describe as well as platform that support this paradigm. It ends with recommendations on how to get ready and how to start using GE Digital Predix Platform.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
From Amazon to Google, top technology firms have embraced data science and machine learning to improve business outcomes. Yet AI adoption beyond these firms has been slow due to obstacles such as hiring talent, heterogeneous data, and compute infrastructure. Larger firms have built teams to tackle these issues with some success, but small- and mid-tier firms are at a distinct disadvantage. AI as a Service is a paradigm that levels the playing field and empowers businesses across the spectrum.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
Setup a Data Science Pipeline in a Highly Regulated EnvironmentOlaf Hein
Setting up a data science pipeline in a financial institute can be really challenging. Lots of contrary requirements regarding security, stability and agility have to be fulfilled. A proven architecture of such a pipeline, based on Hadoop and web-based notebooks, is described within these slides. By setting up separate clusters for different purposes, the proposed solution offers high flexibility for the data science team while fulfilling security requirements regarding confidentiality, integrity and availability.
The slides are from my talk at Big-Data.AI Summit 2019 #BAS19 in Berlin.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented
orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...SamanthaBerlant
See how to consistently deliver accurate COUNT DISTINCT queries in under a second, even on petabyte-scale datasets. This presentation will share Apache Kylin’s approach to COUNT DISTINCT queries for user behavior analysis.
https://www.brighttalk.com/webcast/18317/414006
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...Tyler Wishnoff
See how to consistently deliver accurate COUNT DISTINCT queries in under a second, even on petabyte-scale datasets. This presentation will share Apache Kylin’s approach to COUNT DISTINCT queries for user behavior analysis. Learn more at: https://kyligence.io/
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
Keynote "Practical case-studies of Industry 4.0 implementation in the global wire and cable manufacturer community" of Clobbi CEO Dmitry Shapovalov that was held 13 Jun 2019 @CRU 2019 Brussels
IBM Cloud Object Storage is flexible portfolio, built to support a broad spectrum
of workloads across a number of industries. We have seen that there are 3
industries that have a greater affinity for object storage. Those include:
• Media & Entertainment
• Health/Life Sciences
• Financial Services
The Data Lake of The University of Queensland : Building the Foundations for ...Amazon Web Services
Universities exist in an internationally competitive environment, and whilst the goal is not to increase profit, they have to increase the efficiency of operations and maximise the resources in an ever-changing landscape. The phrase “data is the new oil” is well accepted, but in the complexity of a modern university, such as the University of Queensland (UQ), it is no longer feasible to “speculate where to drill” for data. The talk will discuss how UQ stood up its central data lake using AWS Professional Services consultancy to achieve rapid deployment and how this is serving as a foundation for many new initiatives.
Presenter: David Stockdale, Deputy Director, Infrastructure Operations, University of Queensland
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSamanthaBerlant
With over 450 million customers, Didi (world’s largest rideshare company) conducts complex user behavior analysis on huge datasets daily. Exact Count Distinct is one of Didi’s most critical metrics, but it is known for being computationally heavy and notoriously slow. The difference between exact Count Distinct and approximate Count Distinct can cost Didi millions of dollars. In this talk, Kaige Liu of the Apache Kylin project will explain how Didi uses Apache Kylin to return exact Distinct Count on billions of rows of data with sub-second latency to generate the most accurate picture of its business.
You will also learn about the latest development in modern OLAP technologies. Kaige will share how Didi and Truck Alliance (a truck-hailing company that processes $100 billion worth of goods yearly) use Apache Kylin to power their analytics platforms that allow 100s of analysts to achieve sub-second latency on petabyte-scale data.
This session covers IBM's various storage solutions for Artificial Intelligence and Big Data Analytics workloads. Presented at IBM TechU in Johannesburg, South Africa September 2019
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
Setup a Data Science Pipeline in a Highly Regulated EnvironmentOlaf Hein
Setting up a data science pipeline in a financial institute can be really challenging. Lots of contrary requirements regarding security, stability and agility have to be fulfilled. A proven architecture of such a pipeline, based on Hadoop and web-based notebooks, is described within these slides. By setting up separate clusters for different purposes, the proposed solution offers high flexibility for the data science team while fulfilling security requirements regarding confidentiality, integrity and availability.
The slides are from my talk at Big-Data.AI Summit 2019 #BAS19 in Berlin.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented
orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...SamanthaBerlant
See how to consistently deliver accurate COUNT DISTINCT queries in under a second, even on petabyte-scale datasets. This presentation will share Apache Kylin’s approach to COUNT DISTINCT queries for user behavior analysis.
https://www.brighttalk.com/webcast/18317/414006
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...Tyler Wishnoff
See how to consistently deliver accurate COUNT DISTINCT queries in under a second, even on petabyte-scale datasets. This presentation will share Apache Kylin’s approach to COUNT DISTINCT queries for user behavior analysis. Learn more at: https://kyligence.io/
International Journal of Grid Computing & Applications (IJGCA)ijgca
Service-oriented computing is a popular design methodology for large scale business computing systems. Grid computing enables the sharing of distributed computing and data resources such as processing, networking and storage capacity to create a cohesive resource environment for executing distributed applications in service-oriented computing. Grid computing represents more business-oriented orchestration of pretty homogeneous and powerful distributed computing resources to optimize the execution of time consuming process as well. Grid computing have received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational in e-Science and businesses. The objective of the journal is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts.
Keynote "Practical case-studies of Industry 4.0 implementation in the global wire and cable manufacturer community" of Clobbi CEO Dmitry Shapovalov that was held 13 Jun 2019 @CRU 2019 Brussels
IBM Cloud Object Storage is flexible portfolio, built to support a broad spectrum
of workloads across a number of industries. We have seen that there are 3
industries that have a greater affinity for object storage. Those include:
• Media & Entertainment
• Health/Life Sciences
• Financial Services
The Data Lake of The University of Queensland : Building the Foundations for ...Amazon Web Services
Universities exist in an internationally competitive environment, and whilst the goal is not to increase profit, they have to increase the efficiency of operations and maximise the resources in an ever-changing landscape. The phrase “data is the new oil” is well accepted, but in the complexity of a modern university, such as the University of Queensland (UQ), it is no longer feasible to “speculate where to drill” for data. The talk will discuss how UQ stood up its central data lake using AWS Professional Services consultancy to achieve rapid deployment and how this is serving as a foundation for many new initiatives.
Presenter: David Stockdale, Deputy Director, Infrastructure Operations, University of Queensland
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSamanthaBerlant
With over 450 million customers, Didi (world’s largest rideshare company) conducts complex user behavior analysis on huge datasets daily. Exact Count Distinct is one of Didi’s most critical metrics, but it is known for being computationally heavy and notoriously slow. The difference between exact Count Distinct and approximate Count Distinct can cost Didi millions of dollars. In this talk, Kaige Liu of the Apache Kylin project will explain how Didi uses Apache Kylin to return exact Distinct Count on billions of rows of data with sub-second latency to generate the most accurate picture of its business.
You will also learn about the latest development in modern OLAP technologies. Kaige will share how Didi and Truck Alliance (a truck-hailing company that processes $100 billion worth of goods yearly) use Apache Kylin to power their analytics platforms that allow 100s of analysts to achieve sub-second latency on petabyte-scale data.
This session covers IBM's various storage solutions for Artificial Intelligence and Big Data Analytics workloads. Presented at IBM TechU in Johannesburg, South Africa September 2019
2nd PyData Piraeus meetup - Data Science Initiatives in Titan Cement CompanyPyData Piraeus
TITAN Group is an international cement and building materials producer aspiring to serve the needs of society, while contributing to sustainable growth with responsibility and integrity. During the era of Industry 4.0 Titan Group is investing in a Digital Transformation strategy with several Data Science projects being part of it. Alexandros Tsolkas will introduce us to Titan Group and its activities and he will give an overview on how we apply Data Science, from the data management to the way we collaborate with the industry experts. Panagiotis Ypsilantis will summarize the Supply Chain Advanced Analytics use cases in the cement industry (Demand Forecasting, Supply Network Optimization, Inventory Optimization) that currently are developed in Titan and he will present with more details the way that Titan optimizes its Spare Parts Inventory using forecasting and Monte Carlo simulation models. Finally, Ilias Panagoulias is going to describe the challenges and the results of the implementation of a Real Time Optimization platform in the cement process.
How to build containerized architectures for deep learning - Data Festival 20...Antje Barth
When it comes to AI data scientists/engineers tend to focus on tools. Though the data platform that enables these tools is equally important, it’s often overlooked. In fact, 90% of the effort required for success in ML is not the algorithm – it’s the data logistics. In this workshop we will talk about common architecture blueprints to integrate AI in your data centers and how the right data platform choice can make all the difference in launching your AI use case into production! Presented at Data Festival Munich, 2019.
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...DataWorks Summit
Learn how Pure Storage engineering manages streaming 190B log events per day and makes use of that deluge of data in our continuous integration (CI) pipeline. Our test infrastructure runs over 70,000 tests per day creating a large triage problem that would require at least 20 triage engineers. Instead, Spark's flexible computing platform allows us to write a single application for both streaming and batch jobs to understand the state of our CI pipeline for our team of 3 triage engineers. Using encoded patterns, Spark indexes log data for real-time reporting (Streaming), uses Machine Learning for performance modeling and prediction (Batch job), and finds previous matches for newly encoded patterns (Batch job). Resource allocation in this mixed environment can be challenging; a containerized Spark cluster deployment, and disaggregated compute and storage layers allow us to programmatically shift compute resources between the streaming and batch applications.. This talk will go over design decisions to meet SLAs of streaming and batching in hardware, data layout, access patterns, and containers strategy. We will also go over the challenges, lessons learned, and best practices for similar data pipelines.
Speaker
Joshua Robinson, JOSHUA ROBINSON
Founding Engineer
Pure Storage
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformArne Roßmann
In order to turn data into opportunities, you need to build a modern data analytics platform. But because literally everything changes so fast, built-in flexibility is paramount.
This presentation covers:
- how to leverage all your data to generate insights
- the capabilities needed to build a flexible platform
- how to incorporate sustainability requirement
Data Center of the Future: Designing a modernized, high performance computing...Capgemini
With cloud being hailed as the new black, customers are increasingly looking to easily leverage Hybrid Cloud and Hyper-Converged Architecture, without transformation in technology. At VMworld US 2019, Eric Killinger, Director, IT strategy, Capgemini NA, spoke about how Capgemini makes cloud run better by simplifying infrastructure for your existing landscape via a software-defined data center, supporting immediate OPEX savings, real-time data processing and cloud-based scalability and cost predictability, illustrating the joint success with VMware of such a rollout at Hydro One.
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
Most organizations today implement different data stores to support business operations. As a result, data ends up stored across a multitude of often heterogenous systems, like RDBMS, NoSQL, data warehouses, data marts, Hadoop, etc., with limited interaction and/or interoperability between them. The end result is often a vast eco-system of data stores with different "temperature" data, some level of duplication and, no effective way of bringing it all together for business analytics. With such disparate data, how can an organization exploit the wealth of information? This opens up the need for proven techniques to quickly and easily deliver the data to the people who need it. In this session, you'll see how to modernize your enterprise by making data accessible with enterprise capabilities like querying using SQL, granular security for data access, and maintaining high query performance and high concurrency.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
InfoSphere Optim archive for archive/purge of application data
1. IBM & IDUG 2019 Data Tech Summit
#Db2World #IDUGDb2 #IBMDb2
Bharath Nunepalli, Sr. Db2 DBA
HCA
10/2/19, 2:20 PM
How and why to archive & purge application data?
2. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Agenda
About HCA
Inc.
Our ERP
Environment
Why we
needed a
data archive
strategy
What is
Optim
archive?
How did we
achieve data
archive and
purge?
Limitations in
using Optim
archive tool
Q&A
2
3. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
HCA Inc. – Some facts about us:
3
- HCA is named one of the world’s most ethical companies for
nine years in a row
- 184 hospitals and approximately 2,000 sites of care,
including surgery centers, freestanding ERs, urgent care
centers, and physician clinics in 21 states and the United
Kingdom.
- Ranked 63rd in Fortune 500
- 249,000 employees
38,000 active physicians
90,000 nurses
5,300 IT employees
- 28 million patient encounters per year
8.6 million emergency visits per year
4. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
4
ERP
System
Financials
Payroll
Supply
Chain
Resource
Planning
HR
Enterprise Resource Planning (ERP)
Environment:
- 120+ databases and different swim lanes
supporting ERP development and maintenance
- 1000+ Tablespaces, 2,800 Tables & 7,500 IX
per DB
- Largest table has 1.5+ billion rows and
7 Indexes
5. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Why we needed a data archive strategy
5
App usage
growth
Retention
policies
Tiresome
DBA tasks
Vendor
limitations
6. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
What is Optim archive?
IBMInfoSphere Optim Data Growth Solution for z/OSprovides everything you need to
create and manage archives of relationally intact data from databases with any number
of tables and relationships. Using the archiving features in Optim Data Growth Solution
for z/OS, you can:
• Isolate historical data from current activity and safely remove it to a secure archive.
• Access archived historical data easily, using familiar tools and interfaces.
• Restore archived data to its original business context when it requires additional
processing.
• Build repetitive process which can be executed whenever needed.
6
7. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Prod operational
database
Archive
File
1
2
1
Mainframes
ODBC/JDBC
Reporting Tools
ODM
Optim
Example1
2
7
1
2
Archive
Purge
8. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Prod operational
database
Archive
File
Archive
Database
1
Mainframes
Optim
ODM
23
3
2
1
FTP file
4
ODBC/JDBC
Reporting Tools
Example 2
8
Archive
Restore
1
2
3 Purge
4
Create
FTP file
9. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Prod operational
database
Archive
File
Archive
Database
DB2LUW/Oracle/SQLServer
ODBC/JDBC
Reporting Tools
Mainframes
Optim
ODM
1
1 2
2
3
Example 3
3
9
Archive
Restore
1
2
3 Purge
10. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
How did we achieve data archive and purge?
Choosing the suitable archive
path
Creating Access Definitions (AD)
and relationships
Build and execute JCLs
10
11. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
1. Choosing the suitable archive path
Prod
operational
database
Archive
database
1
1 2
2
3
3
Optim
4
Archive File
Mainframes
11
4
1 Archive
2 Restore
3 Purge
4 Reorg
12. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
2. Creating Access Definitions (AD) and relationships
An Access Definition describes the data to be extracted from the source database.
The components of an Access Definition include the following:
- A list of tables from which the data is extracted.
- Selection criteria (WHERE clause in SQL query).
- The list of relationships to be traversed.
12
13. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
13
14. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
14
15. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
15
16. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
16
17. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
17
18. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
18
19. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
19
20. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
SELECT * FROM creator.EMPLOYEE;
SELECT *
FROM creator.HRHISTORY A
INNER JOIN
creator.EMPLOYEE B ON
A.COMPANY = B.COMPANY AND
A.EMPLOYEE = B.EMPLOYEE
WHERE YEAR(A.DATE_STAMP)<= archive_year;
20
21. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
3. Building and executing JCLs
21
22. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
22
23. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
23
24. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
24
25. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
25
Just some stats to wow you!!!
26. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Limitations in using Optim archive tool
26
Prod
operational
database
Archive
database
2
2 3
3
4
4 Optim
5
Db2 table
1
SQL
query
1
2
Archive
File
RFE#OPTIM-I-126
5
1
Run SQL
query
2 Archive
3 Restore
4 Purge
5 Reorg
27. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
Special Thanks to
Greg Czaja (greg.czaja@unicomsi.com)
27
28. IBM & IDUG Data Tech Summit
Silicon Valley Lab | October 2-4, 2019
28