Presentation September 9 2013 PPAM 2013 Warsaw
Economic Imperative: There are a lot of data and a lot of jobs
Computing Model: Industry adopted clouds which are attractive for data analytics. HPC also useful in some cases
Progress in scalable robust Algorithms: new data need different algorithms than before
Progress in Data Intensive Programming Models
Progress in Data Science Education: opportunities at universities
Reading lists as open data - Meeting the Reading List Challenge 2016Martin Hamilton
1. The document discusses an open data project involving Jisc, Universities UK, and the Open Data Institute to make university reading lists openly available.
2. The project aims to collaborate across universities to publish reading list data in order to power applications like a book recommendation app and identify popular texts for potential deals.
3. Next steps could include using the consolidated open reading list data to recommend new texts, identify books to remove from lists, and monitor adoption of open textbooks between institutions. Barriers to sharing may include lack of common data standards.
Supercomputing and the cloud - the next big paradigm shift?Martin Hamilton
How can cloud technologies help us to address the challenges of re-use of research data and software and reproducibility of experiments? My slides from the University of Birmingham BEARcloud launch event, October 2016
This document discusses the key factors that contributed to the recent boom in deep learning. It identifies better neural network algorithms/techniques, large datasets, massive parallelization using GPUs, and industry investment as major enabling factors. In particular, it highlights how the availability of large, labeled datasets like ImageNet; developments in CNNs, autoencoders, and other neural network architectures; the use of GPUs to enable efficient parallel training; and large-scale research at tech companies like Google were central to recent advances in deep learning.
Health and clinical research - data futures, NIHR accelerating digital programmeMartin Hamilton
The document discusses health and clinical research data futures. It describes Jisc's role in supporting research through services like the Janet network and shared data centers. Safe sharing of encrypted electronic health data is enabled between organizations. Work is being done to provide cloud services for research while ensuring compliance with legal and regulatory requirements. Emerging technologies like storing digital data in DNA, programming biology, and machine learning applied to healthcare are discussed as shaping future data possibilities and workforce needs. Skills in digital literacy, leadership, and navigating new legal aspects will be important as these technologies change healthcare.
The Graph Structure of the Web - Aggregated by Pay-Level Domainoli-unima
The document summarizes research on analyzing the structure of the 2012 web graph when aggregated by pay-level domain (PLD) rather than by individual pages. Some key findings include: the indegree distribution follows a power law but the outdegree distribution does not; the bow-tie structure is unbalanced with a large OUT component compared to previous studies; approximately 42% of domains are connected by paths and the average path length is 4.27 hops; and high connectivity depends more on links to hubs than on hubs themselves. Analysis of topic-specific subgraphs and the public suffix graph show varying patterns of internal and external links.
Big data and the dark arts - Jisc Digital Media 2015Jisc
There still remains a certain misunderstanding by the very definition of "big data" and the perceived hype around the term. This workshop clarified the concepts and give examples of relevant big data projects.
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
This talk supports the Ph.D. in Computational & Data Enabled Science & Engineering at Jackson State University. It describes related educational activities at Indiana University, the Big Data phenomena, jobs and HPC and Big Data computations. It then describes how HPC and Big Data can be converged into a single theme.
Reading lists as open data - Meeting the Reading List Challenge 2016Martin Hamilton
1. The document discusses an open data project involving Jisc, Universities UK, and the Open Data Institute to make university reading lists openly available.
2. The project aims to collaborate across universities to publish reading list data in order to power applications like a book recommendation app and identify popular texts for potential deals.
3. Next steps could include using the consolidated open reading list data to recommend new texts, identify books to remove from lists, and monitor adoption of open textbooks between institutions. Barriers to sharing may include lack of common data standards.
Supercomputing and the cloud - the next big paradigm shift?Martin Hamilton
How can cloud technologies help us to address the challenges of re-use of research data and software and reproducibility of experiments? My slides from the University of Birmingham BEARcloud launch event, October 2016
This document discusses the key factors that contributed to the recent boom in deep learning. It identifies better neural network algorithms/techniques, large datasets, massive parallelization using GPUs, and industry investment as major enabling factors. In particular, it highlights how the availability of large, labeled datasets like ImageNet; developments in CNNs, autoencoders, and other neural network architectures; the use of GPUs to enable efficient parallel training; and large-scale research at tech companies like Google were central to recent advances in deep learning.
Health and clinical research - data futures, NIHR accelerating digital programmeMartin Hamilton
The document discusses health and clinical research data futures. It describes Jisc's role in supporting research through services like the Janet network and shared data centers. Safe sharing of encrypted electronic health data is enabled between organizations. Work is being done to provide cloud services for research while ensuring compliance with legal and regulatory requirements. Emerging technologies like storing digital data in DNA, programming biology, and machine learning applied to healthcare are discussed as shaping future data possibilities and workforce needs. Skills in digital literacy, leadership, and navigating new legal aspects will be important as these technologies change healthcare.
The Graph Structure of the Web - Aggregated by Pay-Level Domainoli-unima
The document summarizes research on analyzing the structure of the 2012 web graph when aggregated by pay-level domain (PLD) rather than by individual pages. Some key findings include: the indegree distribution follows a power law but the outdegree distribution does not; the bow-tie structure is unbalanced with a large OUT component compared to previous studies; approximately 42% of domains are connected by paths and the average path length is 4.27 hops; and high connectivity depends more on links to hubs than on hubs themselves. Analysis of topic-specific subgraphs and the public suffix graph show varying patterns of internal and external links.
Big data and the dark arts - Jisc Digital Media 2015Jisc
There still remains a certain misunderstanding by the very definition of "big data" and the perceived hype around the term. This workshop clarified the concepts and give examples of relevant big data projects.
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
This talk supports the Ph.D. in Computational & Data Enabled Science & Engineering at Jackson State University. It describes related educational activities at Indiana University, the Big Data phenomena, jobs and HPC and Big Data computations. It then describes how HPC and Big Data can be converged into a single theme.
Web search-metrics-tutorial-www2010-section-1of7-introductionAli Dasdan
This document provides an introduction to a tutorial on web search engine metrics for measuring user satisfaction. It discusses the need for metrics to measure and improve search engines. It outlines the typical search engine pipeline and how metrics can evaluate different parts of the pipeline from a user and system perspective. The document then covers various considerations for collecting and analyzing metrics, such as sampling methods, metric dimensions, and challenges. It concludes by listing some key open problems in metrics and providing references for further reading.
The slides for my talk on "HPC as a service" at the 25th anniversary Machine Evaluation Workshop in December 2014. I cover Jisc's HPC brokerage and related initiatives including our shared data centre, industry connectivity to Janet, our VAT cost sharing group, and our pilot of the Kit-Catalogue equipment sharing database.
Digital Transformation of Civil Engineering and Constructionpdemian
Delivered on 30th June 2020, ‘Emerging fields in Civil Engineering’, International Webinar for Students,Easwari Engineering College, Chennai, India (Online)
In this fifth session of the Elements of AI Luxembourg series of webinars, our guest speaker and co-organizer Prof. Martin Theobald talks about Current Topics and Trends in Big Data Analytics. More information, and a recording of the session, can be found on our reddit page:
eofai.lu/reddit
Digital Transformation of Civil Engineering and Constructionpdemian
This document summarizes a presentation on the digital transformation of civil engineering and construction. It discusses drivers for digital transformation like client demands for more information and improved productivity. It also discusses the potential for a national digital twin and recent research projects. These include a BIM search engine called 3DIR, identifying national capabilities needed for information management, and applications of augmented and virtual reality. The presentation concludes that the UK is a world leader in areas like mandating BIM use and is in an exciting time for digital transformation in the built environment sector.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Alexandru Iosup
Data are pouring in, and defining and providing data-processing services at massive scale, in short, Big Data services, could significantly improve the revenue of Europe's Small and Medium Enterprises (SMEs). A paradigm shift is about occur, one in which data processing becomes a basic life utility, for both SMEs and the European people. Although the burgeoning datacenter industry, of which the Netherlands is a top player in Europe, is promising to enable Big Data services, the architectures and even infrastructure for these services are still lagging behind in performance, efficiency, and sophistication, and are built as monoliths reminding us of traditional data silos. Can we remove the performance and efficiency limitations of the current Big Data ecosystems, that is, of the complex stacks of middleware that are currently in use, for Big Data services? In this talk, I will present several use cases (workloads) of Big Data services for time-stamped [2,3] and graph data [4], evaluate or benchmark the performance of several Big Data stacks [3,4] for these use-cases, and present a path (and promising early results) to providing a generic, data-agnostic, non-monolithic Big Data architecture that can efficiently and elastically use datacenter resources via cloud computing interfaces [1,5].
[1] A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from GPUs to Clouds. Proc. of SC|12 (MTAGS).? http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
[2] de Ruiter and Iosup. A workload model for MapReduce. MSc thesis at TU Delft. Jun 2012. Available online via TU Delft Library, http://library.tudelft.nl
[3] Hegeman, Ghit, Capotã, Hidders, Epema, Iosup. The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation. IEEE Big Data 2013. http://www.pds.ewi.tudelft.nl/~iosup/btworld-mapreduce-workflow13ieeebigdata.pdf
[4] Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis. IEEE IPDPS 2014. http://www.pds.ewi.tudelft.nl/~iosup/perf-eval-graph-proc14ipdps.pdf
[5] B. Ghit, N. Yigitbasi, A. Iosup, and D. Epema. Balanced Resource Allocations Across Multiple Dynamic MapReduce Clusters. ACM SIGMETRICS 2014. http://pds.twi.tudelft.nl/~iosup/dynamic-mapreduce14sigmetrics.pdf
The document discusses the evolution of the semantic web and big data. It provides examples of how semantic web technologies can be applied to large datasets from domains such as climate research. It also discusses linked open data and the growth of the linked open data cloud over time. Public open data initiatives are described along with the benefits of a data economy where non-tangible assets like data play a significant role.
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)Martin Hamilton
This is my presentation on "Rebooting" Jisc, from the EUNIS 2014 Congress at Umeå, Sweden. I begin by introducing Jisc, for anyone not already familiar with who we are and what we do. I highlight a few of our success stories that the EUNIS audience might not be familiar with, talk about some current projects - and how our focus and structure has changed following the Wilson Review. I close with our mission statement and vision for 2020.
The research data spring project "DataVault" slides for the third sandpit workshop. Project led by University of Manchester and University of Edinburgh.
Makers Go To College - Your Digital Future 2016Martin Hamilton
Young digital makers will need a new kind of college - some thoughts from me, presented at the City of Liverpool College Your Digital Future event in June 2016.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
Under the grid computing paradigm, large sets of heterogeneous resources can be aggregated and shared. Grid development and acceptance hinge on proving that grids reliably support real applications, and on creating adequate benchmarks to quantify this support. However, applications of grids (and clouds) are just beginning to emerge, and traditional benchmarks have yet to prove representative in grid environments. To address this chicken-and-egg problem, we propose a middle-way approach: create and run synthetic grid workloads comprised of applications representative for today's grids (and clouds). For this purpose, we have designed and implemented GrenchMark, a framework for synthetic workload generation and submission. The framework greatly facilitates synthetic workload modeling, comes with over 35 synthetic and real applications, and is extensible and flexible. We show how the framework can be used for grid system analysis, functionality testing in grid environments, and for comparing different grid settings, and present the results obtained with GrenchMark in our multi-cluster grid, the DAS.
Putting Data to Work: Moving science forward together beyond where we thought...Erin Robinson
This document discusses how putting data to work through community. It outlines the traditional approach of individual science projects versus a community approach. The traditional approach involves scientists independently finding, accessing, analyzing and publishing data. The community approach advocates opening this process up through shared infrastructure and standards to allow more collaborative data reuse. It provides examples of communities like the air quality community that have worked to develop interoperable standards and services. Overall, it argues that a community approach where data and standards are shared can lead to more open science and greater data reuse.
The future of cloud computing - Jisc Digifest 2016Jisc
In Jisc's future of cloud computing horizon scan report, we identified three strategic areas where Jisc could support universities and colleges in moving to the cloud – cloud as a utility, app as a service, and working to build capability in cloud technologies.
Come along to this session to hear more about this work from Jisc futurist Martin Hamilton, and find out how you can get involved.
The Safe Share Project is a pilot project running from 2014-2017 that enables the secure exchange of health data between universities and research institutions. It uses an encrypted overlay network over Janet to facilitate analysis while protecting sensitive data. The goal is to further medical research on diseases and treatments through collaborative analysis of data, in a way that maintains public trust through secure handling of personal information.
1. The document describes a study that aimed to develop an open government data (OGD) platform that integrates OGD and social media features to better stimulate value generation from OGD.
2. Researchers designed a prototype platform with features like data processing, feedback/collaboration, data quality ratings, and grouping/interaction capabilities.
3. An evaluation of the prototype found that users appreciated the novel social media-inspired features and found them useful for collaborating around OGD.
This document provides an overview of a lecture on big data analytics given by Dr. Ching-Yung Lin. The key points covered in the lecture include:
- Definitions and characteristics of big data based on the 3V's of volume, velocity and variety.
- Techniques used for big data such as massive parallelism, distributed storage and processing, machine learning and data visualization.
- Factors that have enabled big data to become prominent in recent years like greater data collection, open source software and commodity hardware.
- Examples of big data platforms, databases and analytics techniques including Hadoop, Spark, NoSQL databases and graph databases.
- The large and growing market for big data
The document provides an overview of big data analytics. It defines big data as high-volume, high-velocity, and high-variety information assets that require cost-effective and innovative forms of processing for insights and decision making. Big data is characterized by the 3Vs - volume, velocity, and variety. The emergence of big data is driven by the massive amount of data now being generated and stored, availability of open source tools, and commodity hardware. The course will cover Apache Hadoop, Apache Spark, streaming analytics, visualization, linked data analysis, and big data systems and AI solutions.
Web search-metrics-tutorial-www2010-section-1of7-introductionAli Dasdan
This document provides an introduction to a tutorial on web search engine metrics for measuring user satisfaction. It discusses the need for metrics to measure and improve search engines. It outlines the typical search engine pipeline and how metrics can evaluate different parts of the pipeline from a user and system perspective. The document then covers various considerations for collecting and analyzing metrics, such as sampling methods, metric dimensions, and challenges. It concludes by listing some key open problems in metrics and providing references for further reading.
The slides for my talk on "HPC as a service" at the 25th anniversary Machine Evaluation Workshop in December 2014. I cover Jisc's HPC brokerage and related initiatives including our shared data centre, industry connectivity to Janet, our VAT cost sharing group, and our pilot of the Kit-Catalogue equipment sharing database.
Digital Transformation of Civil Engineering and Constructionpdemian
Delivered on 30th June 2020, ‘Emerging fields in Civil Engineering’, International Webinar for Students,Easwari Engineering College, Chennai, India (Online)
In this fifth session of the Elements of AI Luxembourg series of webinars, our guest speaker and co-organizer Prof. Martin Theobald talks about Current Topics and Trends in Big Data Analytics. More information, and a recording of the session, can be found on our reddit page:
eofai.lu/reddit
Digital Transformation of Civil Engineering and Constructionpdemian
This document summarizes a presentation on the digital transformation of civil engineering and construction. It discusses drivers for digital transformation like client demands for more information and improved productivity. It also discusses the potential for a national digital twin and recent research projects. These include a BIM search engine called 3DIR, identifying national capabilities needed for information management, and applications of augmented and virtual reality. The presentation concludes that the UK is a world leader in areas like mandating BIM use and is in an exciting time for digital transformation in the built environment sector.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Alexandru Iosup
Data are pouring in, and defining and providing data-processing services at massive scale, in short, Big Data services, could significantly improve the revenue of Europe's Small and Medium Enterprises (SMEs). A paradigm shift is about occur, one in which data processing becomes a basic life utility, for both SMEs and the European people. Although the burgeoning datacenter industry, of which the Netherlands is a top player in Europe, is promising to enable Big Data services, the architectures and even infrastructure for these services are still lagging behind in performance, efficiency, and sophistication, and are built as monoliths reminding us of traditional data silos. Can we remove the performance and efficiency limitations of the current Big Data ecosystems, that is, of the complex stacks of middleware that are currently in use, for Big Data services? In this talk, I will present several use cases (workloads) of Big Data services for time-stamped [2,3] and graph data [4], evaluate or benchmark the performance of several Big Data stacks [3,4] for these use-cases, and present a path (and promising early results) to providing a generic, data-agnostic, non-monolithic Big Data architecture that can efficiently and elastically use datacenter resources via cloud computing interfaces [1,5].
[1] A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from GPUs to Clouds. Proc. of SC|12 (MTAGS).? http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
[2] de Ruiter and Iosup. A workload model for MapReduce. MSc thesis at TU Delft. Jun 2012. Available online via TU Delft Library, http://library.tudelft.nl
[3] Hegeman, Ghit, Capotã, Hidders, Epema, Iosup. The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation. IEEE Big Data 2013. http://www.pds.ewi.tudelft.nl/~iosup/btworld-mapreduce-workflow13ieeebigdata.pdf
[4] Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis. IEEE IPDPS 2014. http://www.pds.ewi.tudelft.nl/~iosup/perf-eval-graph-proc14ipdps.pdf
[5] B. Ghit, N. Yigitbasi, A. Iosup, and D. Epema. Balanced Resource Allocations Across Multiple Dynamic MapReduce Clusters. ACM SIGMETRICS 2014. http://pds.twi.tudelft.nl/~iosup/dynamic-mapreduce14sigmetrics.pdf
The document discusses the evolution of the semantic web and big data. It provides examples of how semantic web technologies can be applied to large datasets from domains such as climate research. It also discusses linked open data and the growth of the linked open data cloud over time. Public open data initiatives are described along with the benefits of a data economy where non-tangible assets like data play a significant role.
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)Martin Hamilton
This is my presentation on "Rebooting" Jisc, from the EUNIS 2014 Congress at Umeå, Sweden. I begin by introducing Jisc, for anyone not already familiar with who we are and what we do. I highlight a few of our success stories that the EUNIS audience might not be familiar with, talk about some current projects - and how our focus and structure has changed following the Wilson Review. I close with our mission statement and vision for 2020.
The research data spring project "DataVault" slides for the third sandpit workshop. Project led by University of Manchester and University of Edinburgh.
Makers Go To College - Your Digital Future 2016Martin Hamilton
Young digital makers will need a new kind of college - some thoughts from me, presented at the City of Liverpool College Your Digital Future event in June 2016.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
Under the grid computing paradigm, large sets of heterogeneous resources can be aggregated and shared. Grid development and acceptance hinge on proving that grids reliably support real applications, and on creating adequate benchmarks to quantify this support. However, applications of grids (and clouds) are just beginning to emerge, and traditional benchmarks have yet to prove representative in grid environments. To address this chicken-and-egg problem, we propose a middle-way approach: create and run synthetic grid workloads comprised of applications representative for today's grids (and clouds). For this purpose, we have designed and implemented GrenchMark, a framework for synthetic workload generation and submission. The framework greatly facilitates synthetic workload modeling, comes with over 35 synthetic and real applications, and is extensible and flexible. We show how the framework can be used for grid system analysis, functionality testing in grid environments, and for comparing different grid settings, and present the results obtained with GrenchMark in our multi-cluster grid, the DAS.
Putting Data to Work: Moving science forward together beyond where we thought...Erin Robinson
This document discusses how putting data to work through community. It outlines the traditional approach of individual science projects versus a community approach. The traditional approach involves scientists independently finding, accessing, analyzing and publishing data. The community approach advocates opening this process up through shared infrastructure and standards to allow more collaborative data reuse. It provides examples of communities like the air quality community that have worked to develop interoperable standards and services. Overall, it argues that a community approach where data and standards are shared can lead to more open science and greater data reuse.
The future of cloud computing - Jisc Digifest 2016Jisc
In Jisc's future of cloud computing horizon scan report, we identified three strategic areas where Jisc could support universities and colleges in moving to the cloud – cloud as a utility, app as a service, and working to build capability in cloud technologies.
Come along to this session to hear more about this work from Jisc futurist Martin Hamilton, and find out how you can get involved.
The Safe Share Project is a pilot project running from 2014-2017 that enables the secure exchange of health data between universities and research institutions. It uses an encrypted overlay network over Janet to facilitate analysis while protecting sensitive data. The goal is to further medical research on diseases and treatments through collaborative analysis of data, in a way that maintains public trust through secure handling of personal information.
1. The document describes a study that aimed to develop an open government data (OGD) platform that integrates OGD and social media features to better stimulate value generation from OGD.
2. Researchers designed a prototype platform with features like data processing, feedback/collaboration, data quality ratings, and grouping/interaction capabilities.
3. An evaluation of the prototype found that users appreciated the novel social media-inspired features and found them useful for collaborating around OGD.
This document provides an overview of a lecture on big data analytics given by Dr. Ching-Yung Lin. The key points covered in the lecture include:
- Definitions and characteristics of big data based on the 3V's of volume, velocity and variety.
- Techniques used for big data such as massive parallelism, distributed storage and processing, machine learning and data visualization.
- Factors that have enabled big data to become prominent in recent years like greater data collection, open source software and commodity hardware.
- Examples of big data platforms, databases and analytics techniques including Hadoop, Spark, NoSQL databases and graph databases.
- The large and growing market for big data
The document provides an overview of big data analytics. It defines big data as high-volume, high-velocity, and high-variety information assets that require cost-effective and innovative forms of processing for insights and decision making. Big data is characterized by the 3Vs - volume, velocity, and variety. The emergence of big data is driven by the massive amount of data now being generated and stored, availability of open source tools, and commodity hardware. The course will cover Apache Hadoop, Apache Spark, streaming analytics, visualization, linked data analysis, and big data systems and AI solutions.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...LIBER Europe
A presentation by Dr. Liz Lyon of the United Kingdom Office for Library and Information Networking, as given at LIBER's 42nd annual conference in Munich, Germany.
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...Geoffrey Fox
Most things are dominated by Artificial Intelligence (AI). Technology Companies like Amazon, Google, Facebook, and Microsoft are AI First organizations.
Engineering achievement today is highlighted by the AI buried in a vehicle or machine. Industry (Manufacturing) 4.0 focusses on the AI-Driven future of the Industrial Internet of Things.
Software is eating the world.
We can describe much computer systems work as designing, building and using the Global AI and Modelling supercomputer which itself is autonomously tuned by AI. We suggest that this is not just a bunch of buzzwords but has profound significance and examine consequences of this for education and research.
Naively high-performance computing should be relevant for the AI supercomputer but somehow the corporate juggernaut is not making so much use of it. We discuss how to change this.
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205Martin Hamilton
How can public cloud and technologies like Docker and OpenStack help to deliver next generation scientific computing infrastructure? My talk for the UK/USA HPC workshop in July 2015, organized by HPC-SIG (UK) and CASC (USA).
Too often I hear the question “Can you help me with our data strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component: the data strategy itself. A more useful request is: “Can you help me apply data strategically?” Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) data strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” This program refocuses efforts on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. It also contributes to three primary organizational data goals. Learn how to improve the following:
- Your organization’s data
- The way your people use data
- The way your people use data to achieve your organizational strategy
This will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs as organizations identify prioritized areas where better assets, literacy, and support (data strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why data strategy is necessary for effective data governance
- An overview of prerequisites for effective strategic use of data strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Presentation of my talk given in Phoenix Data Conference 2019. In this we will look at challenges with current Apache Hadoop ecosystem
Apache Hadoop is still relevant but way of doing Hadoop and enterprise data architecture has to be re-looked as we enter Cognitive and Cloud Native Era
We need
Architecture that is enabled by common run time layer across on premise and cloud
Architecture that can abstract away dependency and version conflicts with tons of open source machine learning out there. Yarn did not scale up in that aspect until one want to deal with multiple conda environment
Architecture that can enable real Hybrid Cloud and Multi Cloud portability
And many more challenges that one has to overcome to keep architecture simple, infrastructure agile and better utilized
This document summarizes a presentation by IDC on big data and high performance data analysis (HPDA). It defines HPDA as combining data-intensive simulation and analytics tasks that require high-performance computing resources. The document outlines several major use cases for HPDA, including fraud detection, health care, and customer analytics. It also profiles specific examples like PayPal's use of HPC for fraud detection and GEICO's pre-calculation of insurance quotes. The document forecasts rapid growth in the HPDA market and notes that new technologies will be required to handle different types of workloads like graph analysis.
Cyberinfrastructure and its Role in ScienceCameron Kiddle
This presentation examines some of the challenges scientists face and describes various cyberinfrastructure technologies that help address these challenges. Example projects employing cyberinfrastructure technologies that we have worked on at the Grid Research Centre, including the GeoChronos project, are also presented. This presentation was given at the IAI International Wireless Sensor Networks Summer School held at the University of Alberta on July 6th, 2009.
Slides of the workshop conducted in Model Engineering College, Ernakulam, and Sree Narayana Gurukulam College, Kadayiruppu Kerala, India in December 2010
IoT to Cloud: Middle Layer (e.g Gateway, Hubs, Fog, Edge Computing)Bob Marcus
The document discusses the role of a middle layer between IoT devices and cloud computing resources. It presents several alternatives for the middle layer, including IoT gateways, edge/fog computing, and multi-level architectures. The optimal approach depends on the use case. For large-scale applications, a multi-level architecture with components at the device, edge, and cloud layers will likely be necessary. The middle layer poses challenges around data processing, communication standards, and extending cloud models to support IoT applications.
This document discusses emerging technologies and trends that will impact research and education networks (NRENs) going forward. It identifies 12 disruptive technologies like mobile internet, cloud computing, and 3D printing that are forecast to greatly impact the global economy. Big data is noted as a primary driver of these changes. The document also discusses implications of globalization, team-based science, and big data - including that NRENs may need to work together more as a global research and education network (GREN) and use software-defined networking (SDN) to help configure networks for different applications and users. It speculates that future NREN nodes could function like content distribution networks (CDNs) and that more computation may occur within
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
This document summarizes the state of global research data initiatives. It discusses that while interest in research data management is growing globally, challenges remain, including lack of advocacy, skills, and incentives. However, it also outlines strengths in many countries through investments in infrastructure and policies. It calls for increased international collaboration and coordination to help manage more research data according to FAIR and open principles.
What makes it worth becoming a Data Engineer?Hadi Fadlallah
This presentation explains what data engineering is for non-computer science students and why it is worth being a data engineer. I used this presentation while working as an on-demand instructor at Nooreed.com
RAPIDS is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.In this workshop, we will:
1. Introduce Rapids.ai & GPUs
2. Illustrate why GPUs are critical for machine learning and AI applications
3. Demonstrate common machine learning algorithms such as Regression, KNN,SGD etc. using RAPIDS on the QuSandbox
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
This document discusses Google Cloud Platform and its data and analytics capabilities. It begins by explaining the evolution of cloud computing models from virtualized data centers to true on-demand cloud services. It then highlights some of Google Cloud Platform's key differentiators like true cloud economics, future-proof infrastructure, access to innovation, and Google-grade security. The document provides overviews of Google Cloud Platform's storage, database, big data, and machine learning offerings and common use cases for each. It also showcases some of Google's innovations in data analytics and machine learning technologies.
This document discusses the opportunities and challenges of using cloud computing technologies in research. It begins with an overview of cloud computing, including the three layers of cloud services. It then explores how researchers can leverage various cloud applications, platforms, and infrastructures. However, it also notes several new ethical issues that arise regarding subject privacy, data security, ownership and control. The document suggests researchers and IRBs face conceptual gaps and policy vacuums in dealing with these issues as cloud technologies continue to evolve rapidly. It emphasizes the need for education, guidance and careful consideration of terms of service agreements.
Similar to Big Data and Clouds: Research and Education (20)
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
“Next Generation Grid – HPC Cloud” proposes a toolkit capturing current capabilities of Apache Hadoop, Spark, Flink and Heron as well as MPI and Asynchronous Many Task systems from HPC. This supports a Cloud-HPC-Edge (Fog, Device) Function as a Service Architecture. Note this "new grid" is focussed on data and IoT; not computing. Use interoperable common abstractions but multiple polymorphic implementations.
High Performance Computing and Big Data Geoffrey Fox
This document proposes a hybrid software stack that combines large-scale data systems from both research and commercial applications. It runs the commodity Apache Big Data Stack (ABDS) using enhancements from High Performance Computing (HPC) to improve performance. Examples are given from bioinformatics and financial informatics. Parallel and distributed runtimes like MPI, Storm, Heron, Spark and Flink are discussed, distinguishing between parallel (tightly-coupled) and distributed (loosely-coupled) systems. The document also discusses optimizing Java performance and differences between capacity and capability computing. Finally, it explains how this HPC-ABDS concept allows convergence of big data, big simulation, cloud and HPC systems.
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Geoffrey Fox
Within the last few years, there have been significant contributions to Java-based big data frameworks and libraries such as Apache Hadoop, Spark, and Storm. While these systems are rich in interoperability and features, developing high performance big data analytic applications is challenging. Also, the study of performance characteristics and high performance optimizations is lacking in the literature for these applications. By contrast, these features are well documented in the High Performance Computing (HPC) domain and some of the techniques have potential performance benefits in the big data domain as well. This paper identifies a class of machine learning applications with significant computation and communication as a yardstick and presents five optimizations to yield high performance in Java big data analytics. Also, it incorporates these optimizations in developing SPIDAL Java - a highly optimized suite of Global Machine Learning (GML) applications. The optimizations include intra-node messaging through memory maps over network calls, improving cache utilization, reliance on processes over threads, zero garbage collection, and employing offheap buffers to load and communicate data. SPIDAL Java demonstrates significant performance gains and scalability with these techniques when running on up to 3072 cores in one of the latest Intel Haswell-based multicore clusters.
http://dsc.soic.indiana.edu/publications/hpc2016-spidal-high-performance-submit-18-public.pdf
http://dsc.soic.indiana.edu/presentations/SPIDALJava.pptx
5th Multicore World
15-17 February 2016 – Shed 6, Wellington, New Zealand
http://openparallel.com/multicore-world-2016/
We start by dividing applications into data plus model components and classifying each component (whether from Big Data or Big Simulations) in the same way. These leads to 64 properties divided into 4 views, which are Problem Architecture (Macro pattern); Execution Features (Micro patterns); Data Source and Style; and finally the Processing (runtime) View.
We discuss convergence software built around HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack) http://hpc-abds.org/kaleidoscope/ and show how one can merge Big Data and HPC (Big Simulation) concepts into a single stack.
We give examples of data analytics running on HPC systems including details on persuading Java to run fast.
Some details can be found at http://dsc.soic.indiana.edu/publications/HPCBigDataConvergence.pdf
DTW: 2015 Data Teaching Workshop – 2nd IEEE STC CC and RDA Workshop on Curricula and Teaching Methods in Cloud Computing, Big Data, and Data Science
as part of CloudCom 2015 (http://2015.cloudcom.org/), Vancouver, Nov 30-Dec 3, 2015.
Discusses Indiana University Data Science Program and experience with online education; the program is available in both online and residential modes. We end by discussing two classes taught both online and residentially and online by Geoffrey Fox. One is BDAA: Big Data Applications & Analytics; The other is BDOSSP: Big Data Open Source Software and Projects. Links are
http://openedx.scholargrid.org/ BDAA Fall 2015
http://datascience.scholargrid.org/ BDOSSP Spring 2016
http://bigdataopensourceprojects.soic.indiana.edu/ Spring 2015
High Performance Processing of Streaming DataGeoffrey Fox
Describes two parallel robot planning algorithms implemented with Apache Storm on OpenStack -- SLAM (Simultaneous Localization & Mapping) and collision avoidance. Performance (response time) studied and improved as example of HPC-ABDS (High Performance Computing enhanced Apache Big Data Software Stack) concept.
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
Describes relations between Big Data and Big Simulation Applications and how this can guide a Big Data - Exascale (Big Simulation) Convergence (as in National Strategic Computing Initiative) and lead to a "complete" set of Benchmarks. Basic idea is to view use cases as "Data" + "Model"
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
HiCOMB 2015 14th IEEE International Workshop on
High Performance Computational Biology at IPDPS 2015
Hyderabad, India. This talk covers parallel data analytics for bioinformatics. Messages are
Always run MDS. Gives insight into data and performance of machine learning
Leads to a data browser as GIS gives for spatial data
3D better than 2D
~20D better than MSA?
Clustering Observations
Do you care about quality or are you just cutting up space into parts
Deterministic Clustering always makes more robust
Continuous clustering enables hierarchy
Trimmed Clustering cuts off tails
Distinct O(N) and O(N2) algorithms
Use Conjugate Gradient
Lessons from Data Science Program at Indiana University: Curriculum, Students...Geoffrey Fox
Invited talk at NSF/TCPP Workshop on Parallel and Distributed Computing Education Edupar at IPDPS 2015 May 25, 2015 5/25/2015 Hyderabad
Discusses Indiana University Data Science Program and experience with online education; the program is available in both online and residential modes. We end by discussing two classes taught both online and residentially and online by Geoffrey Fox. One is BDAA: Big Data Applications & Analytics https://bigdatacourse.appspot.com/course. The other is BDOSSP: Big Data Open Source Software and Projects http://bigdataopensourceprojects.soic.indiana.edu/
Data Science Curriculum at Indiana UniversityGeoffrey Fox
The document provides details about the Data Science curriculum at Indiana University. It discusses the background of the School of Informatics and Computing, including its establishment and inclusion of computer science, library and information science programs. It then describes the Data Science certificate and masters programs, including course requirements, tracks, and admissions. The programs aim to provide students with skills in data analysis, lifecycle, management, and applications through coursework in relevant technical areas.
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...Geoffrey Fox
Advances in high-performance/parallel computing in the 1980's and 90's was spurred by the development of quality high-performance libraries, e.g., SCALAPACK, as well as by well-established benchmarks, such as Linpack.
Similar efforts to develop libraries for high-performance data analytics are underway. In this talk we motivate that such benchmarks should be motivated by frequent patterns encountered in high-performance analytics, which we call Ogres.
Based upon earlier work, we propose that doing so will enable adequate coverage of the "Apache" bigdata stack as well as most common application requirements, whilst building upon parallel computing experience.
Given the spectrum of analytic requirements and applications, there are multiple "facets" that need to be covered, and thus we propose an initial set of benchmarks - by no means currently complete - that covers these characteristics.
We hope this will encourage debate
Experience with Online Teaching with Open Source MOOC TechnologyGeoffrey Fox
This memo describes experiences with online teaching in Spring Semester 2014. We discuss the technologies used and the approach to teaching/learning.
This work is based on Google Course Builder for a Big Data overview course
We present a software model built on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We discuss layers in this stack
We give examples of integrating ABDS with HPC
We discuss how to implement this in a world of multiple infrastructures and evolving software environments for users, developers and administrators
We present Cloudmesh as supporting Software-Defined Distributed System as a Service or SDDSaaS with multiple services on multiple clouds/HPC systems.
We explain the functionality of Cloudmesh as well as the 3 administrator and 3 user modes supported
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However the same is not so true for data intensive problems even though commercial clouds presumably devote more resources to data analytics than supercomputers devote to simulations. We try to establish some principles that allow one to compare data intensive architectures and decide which applications fit which machines and which software.
We use a sample of over 50 big data applications to identify characteristics of data intensive applications and propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks. We consider hardware from clouds to HPC. Our software analysis builds on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We illustrate issues with examples including kernels like clustering, and multi-dimensional scaling; cyberphysical systems; databases; and variants of image processing from beam lines, Facebook and deep-learning.
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
At eScience in the Cloud 2014, Redmond WA, April 30 2014
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations.
We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on combining HPC and the Apache software stack that is well used in modern cloud computing.
Initial results on Azure and HPC Clusters are presented
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC Geoffrey Fox
This proposes an integration of HPC and Apache Technologies. HPC-ABDS+ Integration areas include
File systems,
Cluster resource management,
File and object data management,
Inter process and thread communication,
Analytics libraries,
Workflow
Monitoring
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
Ogres classify Big Data applications by multiple facets – each with several exemplars and features. This gives a
guide to breadth and depth of Big Data and allows one to examine which ogres a particular architecture/software support.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
6. https://portal.futuregrid.org
Some Data sizes
~40 109 Web pages at ~300 kilobytes each = 10
Petabytes
LHC 15 petabytes per year
Radiology 69 petabytes per year
Square Kilometer Array Telescope will be 100
terabits/second; LSST Survey >20TB per day
Earth Observation becoming ~4 petabytes per year
Earthquake Science – few terabytes total today
PolarGrid – 100’s terabytes/year becoming petabytes
Exascale simulation data dumps – terabytes/second
Deep Learning to train self driving car; 100 million
megapixel images ~ 100 terabytes
6
19. https://portal.futuregrid.org
Clouds & Data Intensive Applications
• Applications tend to be new and so can consider emerging
technologies such as clouds
• Do not have lots of small messages but rather large reduction (aka
Collective) operations
– New optimizations e.g. for huge messages
• “Large Scale Optimization”: Deep Learning, Social Image
Organization, Clustering and Multidimensional Scaling which are
variants of EM
• EM (expectation maximization) tends to be good for clouds and
Iterative MapReduce
– Quite complicated computations (so compute largish compared to
communicate)
– Communication is Reduction operations (global sums or linear) or Broadcast
• Machine Learning has FULL Matrix kernels
19
38. https://portal.futuregrid.org
Massive Open Online Courses (MOOC)
• MOOC’s are very “hot” these days with Udacity and
Coursera as start‐ups; perhaps over 100,000 participants
• Relevant to Data Science (where IU is preparing a MOOC)
as this is a new field with few courses at most universities
• Typical model is collection of short prerecorded segments
(talking head over PowerPoint) of length 3‐15 minutes
• These “lesson objects” can be viewed as “songs”
• Google Course Builder (python open source) builds
customizable MOOC’s as “playlists” of “songs”
• Tells you to capture all material as “lesson objects”
• We are aiming to build a repository of many “songs”; used
in many ways – tutorials, classes …
38
41. https://portal.futuregrid.org
Customizable MOOC’s
• We could teach one class to 100,000 students or 2,000 classes to 50
students
• The 2,000 class choice has 2 useful features
– One can use the usual (electronic) mentoring/grading technology
– One can customize each of 2,000 classes for a particular audience given their
level and interests
– One can even allow student to customize – that’s what one does in making
play lists in iTunes
– Flipped Classroom
• Both models can be supported by a repository of lesson objects (3‐
15 minute video segments) in the cloud
• The teacher can choose from existing lesson objects and add their
own to produce a new customized course with new lessons
contributed back to repository
41
45. https://portal.futuregrid.org
Conclusions• Data Intensive programs are not like simulations as they have large
“reductions” (“collectives”) and do not have many small messages
– Clouds suitable and in fact HPC sometimes optimal
• Iterative MapReduce an interesting approach; need to optimize collectives
for new applications (Data analytics) and resources (clouds, GPU’s …)
• Need an initiative to build scalable high performance data analytics library
on top of interoperable cloud‐HPC platform
– Full matrices important
• More employment opportunities in clouds than HPC and Grids and in data
than simulation; so cloud and data related activities popular with students
• Community activity to discuss data science education
– Agree on curricula; is such a degree attractive?
• Role of MOOC’s for either
– Disseminating new curricula
– Managing course fragments that can be assembled into custom courses
for particular interdisciplinary students
45