This was a 30 min talk intended as one of the opening/overview presentations before a full-day deep dive into ScienceDMZ design patterns and architectures.
Direct downloads are not enabled. Contact me directly (chris@bioteam.net) if you for some odd reason want a copy of this slide deck!
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
Talk slides from my annual address at the Bio-IT World Expo & Conference where I cover trends, best practices and emerging pain points for life science focused HPC, scientific computing and "research IT"
Email "chris@bioteam.net" if you want a PDF copy of these slides. I've disabled the raw powerpoint download option on slideshare.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
As presented at BioIT World 2016. In one of the more popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.
Video link from the presentation: biote.am/bs
[Note: email chris@bioteam.net if you would like a PDF copy of this presentation]
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
Mapping Life Science Informatics to the CloudChris Dagdigian
Infrastructure cloud platforms such as those offered by Amazon Web Services are not designed and built with scientific research as the primary use case. These presentation slides cover the current state of mapping life science research and HPC technique onto “the cloud” and how to work around the common engineering, orchestration and data movement problems.
[Note: I've replaced the 2011 version of this talk deck with a slightly updated version as delivered at the AIRI Petabyte Challenge Meeting]
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
October 2013 "Beyond the Genome" presentation slides. Talk is mostly focused on issues around IaaS cloud usage for "Bio-IT" and life science informatics & scientific computing.
PDF SLIDES AVAILABLE DIRECTLY - PLEASE EMAIL "CHRIS@BIOTEAM.NET" FOR SLIDES
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
Talk slides from my annual address at the Bio-IT World Expo & Conference where I cover trends, best practices and emerging pain points for life science focused HPC, scientific computing and "research IT"
Email "chris@bioteam.net" if you want a PDF copy of these slides. I've disabled the raw powerpoint download option on slideshare.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
As presented at BioIT World 2016. In one of the more popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.
Video link from the presentation: biote.am/bs
[Note: email chris@bioteam.net if you would like a PDF copy of this presentation]
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
Mapping Life Science Informatics to the CloudChris Dagdigian
Infrastructure cloud platforms such as those offered by Amazon Web Services are not designed and built with scientific research as the primary use case. These presentation slides cover the current state of mapping life science research and HPC technique onto “the cloud” and how to work around the common engineering, orchestration and data movement problems.
[Note: I've replaced the 2011 version of this talk deck with a slightly updated version as delivered at the AIRI Petabyte Challenge Meeting]
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
October 2013 "Beyond the Genome" presentation slides. Talk is mostly focused on issues around IaaS cloud usage for "Bio-IT" and life science informatics & scientific computing.
PDF SLIDES AVAILABLE DIRECTLY - PLEASE EMAIL "CHRIS@BIOTEAM.NET" FOR SLIDES
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
This is a talk I put together for a http://www.neren.org/ seminar called "Bridging the Gap: Research Facilitation". Tried to give a biotech/pharma view for a mostly academic audience.
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Tiny slide deck from a 5-min lightning talk covering a recent project involving live replication of 2-petabytes of scientific data.
Please leave feedback if you'd like to see this as a long-form technical blog article or conference talk, thanks!
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
Keynote, Munich, June 2016
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Briefing room: An alternative for streaming data collectionmark madsen
Knowing what’s happening in your enterprise right now can mark the difference between success and failure. The key is to have a rich view of activity, such that analysts and others can explore in a fully multidimensional fashion. Benefiting from such a detailed perspective can help professionals identify the exact nature of problems or opportunities, thus enabling precise actions that make a difference quickly.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain how a nexus of innovations for analyzing network traffic can help companies stay on top of their game. He’ll be briefed by Erik Giesa of ExtraHop, who will showcase his company’s stream analytics technology for wire data, which provides real-time, multidimensional views of network traffic. He’ll share success stories of how ExtraHop has solved otherwise intractable problems and enabled a new level of root-cause analysis.
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
IT Performance Management Handbook for CIOsVikram Ramesh
Learn why measuring performance on individual devices and systems often leaves admins flying blind when it comes to SLA management and identifying performance bottlenecks. This in-depth e-Guide talks about how VirtualWisdom4 can give administrators a live, up- to-the-second view across the system-wide IT infrastructure.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
(PDF available upon request). This is an updated version of the 2012 BioITWorld Boston talk that I gave 6 weeks later at Bio IT World Asia in June 2012. Some slide content was updated and revised and I also deleted a number of slides in an attempt to shorten the talk since I'm known to speak fast. There was legit concern I'd be unintelligible to non-native english speakers!
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
This is a talk I put together for a http://www.neren.org/ seminar called "Bridging the Gap: Research Facilitation". Tried to give a biotech/pharma view for a mostly academic audience.
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Tiny slide deck from a 5-min lightning talk covering a recent project involving live replication of 2-petabytes of scientific data.
Please leave feedback if you'd like to see this as a long-form technical blog article or conference talk, thanks!
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
Keynote, Munich, June 2016
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Briefing room: An alternative for streaming data collectionmark madsen
Knowing what’s happening in your enterprise right now can mark the difference between success and failure. The key is to have a rich view of activity, such that analysts and others can explore in a fully multidimensional fashion. Benefiting from such a detailed perspective can help professionals identify the exact nature of problems or opportunities, thus enabling precise actions that make a difference quickly.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain how a nexus of innovations for analyzing network traffic can help companies stay on top of their game. He’ll be briefed by Erik Giesa of ExtraHop, who will showcase his company’s stream analytics technology for wire data, which provides real-time, multidimensional views of network traffic. He’ll share success stories of how ExtraHop has solved otherwise intractable problems and enabled a new level of root-cause analysis.
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
IT Performance Management Handbook for CIOsVikram Ramesh
Learn why measuring performance on individual devices and systems often leaves admins flying blind when it comes to SLA management and identifying performance bottlenecks. This in-depth e-Guide talks about how VirtualWisdom4 can give administrators a live, up- to-the-second view across the system-wide IT infrastructure.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
(PDF available upon request). This is an updated version of the 2012 BioITWorld Boston talk that I gave 6 weeks later at Bio IT World Asia in June 2012. Some slide content was updated and revised and I also deleted a number of slides in an attempt to shorten the talk since I'm known to speak fast. There was legit concern I'd be unintelligible to non-native english speakers!
Talk slides as delivered at the 2012 Bio-IT World Conference in Boston, MA
This is my annual "state of the state" address that has become somewhat popular.
Big data is everywhere , although sometimes we may not immediately realize it . First thing to be believed is that most of us don't deal with large amount of data in our life except in unusual circumstance. Lacking this immediate experience, we often fail to understand both opportunities as well challenges presented by big data. There are currently a number of issues and challenges in addressing these characteristics going forward.
General overview of the Big Data Concept.
Presentation of the Hierarchical Linear Subspace Indexing Method to perform exact similarity search in high dimensional data
Introduction to Big Data (non-technical) and the importance of Data Science to create meaning.
First of all we define Big Data in the light of the 3 Vs: volume, velocity and variety; next we move on to redefine Big Data, and we touch the topic of a data lake. We envision that Big Data will become mainstream for small organisations as well, what we can do with Big Data, how to tackle Big Data projects, what challenges lie ahead, but what opportunities are there to reap. And of course how important data science is to find the meaning in all the data.
High-Performance Networking Use Cases in Life SciencesAri Berman
Big data has arrived in the life science research domain and has driven the need for optimized high-performance networks in these research environments. Many petabytes of data transfer, storage and analytics are now a reality due to the fact that data is being produced cheaply and rapidly at unprecedented rates in academic, commercial and clinical laboratories. These data flows are complicated by the combination of high-frequency mouse flows as well as high-volume elephant flows, sometimes from the same application operating in parallel environments. Additional complicating factors include collaborative research efforts on large data stores that utilize both common and disparate compute resources, the need for high-performance data encryption in-flight to cover the transmission and handling of clinical data, and the relatively poor state of algorithm development from an IO standpoint throughout the industry. This presentation will cover representative advanced networking use cases from life sciences research, the challenges that they present in networking environments, some solutions that are being deployed with in both small and large institutions, and an overview of a few of the unresolved problems to date.
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
The Next-Generation sequencing data-deluge requires storage and compute services to be provisioned at an ever-increasing rate. Can Cloud (and last decade's buzzword, Grid), help us?
Talk given at the NHGRI Cloud computing workshop, 2010.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
In the first part of this talk, we will give a setup and definition of modern cloud data warehouses as well as outline problems with legacy and on-premise data warehouses.
We will speak to selecting, technically justifying, and practically using modern data warehouses, including criteria for how to pick a cloud data warehouse and where to start, how to use it in an optimum way and use it cost effectively.
In the second part of this talk, we discuss the challenges and where people are not getting their investment. In this business-focused track, we cover how to get business engagement, identifying the business cases/use cases, and how to leverage data as a service and consumption models.
Modern Data Integration Expert Session Webinar ibi
William McKnight, President of McKnight Consulting Group and Information Builders’ Jake Freivald discuss the tools needed for a successful modern data integration.
Information Builders provides the industry’s most scalable software solutions for data management and analytics. We help organizations operationalize and monetize their data through insights that drive action. Our integrated platform for BI, analytics, data integration, and data quality, combined with our proven expertise, delivers value faster, with less risk. We believe data and analytics are the drivers of digital transformation, and we’re on a mission to help our customers capitalize on new opportunities in the connected world. Information Builders is headquartered in New York, NY, with global offices, and remains one of the largest privately held companies in the industry.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
3. 3
Chris & Ari: Why 2 of Us Today?
Answer: Ari concentrates on Federal/US.Gov while I deal mostly
with commercial biotech/pharma, EDU and non-profit Orgs.
They are very different.
8. This data will be moving constantly …
Illumina HiSeq x 10
‣ Raw Instrument Data
• +13 TB every 3 days
‣ FASTQ Conversion
• +8 TB every 3 days
‣ Align -> Compressed BAM
• +2 TB every three days
‣ Data Distribution
• ?
8
9. 9
Coming Soon To a Researcher Near You:
USB-attached genomic sequencing
Gulp.
10. 10
Tipping Point #1
Effort/cost of generating or acquiring vast piles of data
in 2015 is far less than real world cost of storing and
managing that data through a realistic lifecycle.
11. 11
Tipping Point #2
Scientists still believe storage is cheap & near-infinite.
Data triage no longer sufficient. Scientists rarely asked
to articulate a scientific/business case for storage.
12. 12
Tipping Point #3
Centralized infrastructure models are not sufficient and
must be modified. Data & compute WILL span sites and
locations with or without active IT involvement.
We need to start preparing now.
14. 14
“Center Of Gravity” Problem
Current methods involving centralized storage and bringing
“users” and “compute” very close “… to the data” are going
to face significant problems in 2015 and beyond.
15. 15
“Center Of Gravity” Pain #1
Terabyte class instruments. Everywhere. Gulp.
We can not stop this trend - large scale data generation will span labs,
bulding, campus sites & WANs
16. 16
“Center Of Gravity” Pain #2
Collaborations & Peta-scale Open Access Data
The future of large scale genomics|informatics increasingly involves
multi-party / multi-site collaboration. Also: Petabytes of free data (!!)
17. 17
“Center Of Gravity” Pain #3
Object Storage Less Effective @ Single Site
Object storage is the future of scientific data at rest. Some major side
benefits (erasure coding, etc.) can only be realized when 3 or more
sites are involved
18. 18
“Center Of Gravity” Summarized
Data spread is unavoidable. Effectively Unstoppable.
We have a WAN-scale data movement/access problem.
There are ~2 viable approaches going forward ...
19. 19
Option 1 - “Stay Centralized”
Still totally viable but much faster connectivity to
instruments & collaborators will be essential
Nutshell: Significant investment in edge/WAN connectivity required,
likely requiring bandwidth exceeding 10Gbps
20. 20
Option 2 - “Go With The Flow”
Embrace the distributed & “cloudy” future where
compute & storage span multiple zones
Nutshell: Still requires massive bandwidth upgrades to support
metadata-aware or location-aware access & compute
22. 22
Terabyte-scale data movement is
going to be an informatics “grand
challenge” for the next 2-3+ years
And far harder/scarier than previous compute & storage challenges
24. Long history of engagement & cooperation
Research IT vs. Enterprise IT
‣ Historically our infrastructure requirements
often surpassed what the Enterprise uses to
sustain day to day operation
‣ We’ve spent ~20 years working closely with
Enterprise IT to enable “data intensive
science”
‣ Relatively easy to align informatics IT
infrastructure with established vendor,
product, technology and architecture
standards
24
25. Barely worth talking about in 2015
25
Computing Power
‣ 32 CPU cores to 60,000
cores - it almost does not
matter
‣ Simple commodity
‣ Interesting & challenging
but not insanely hard.
‣ Easy to acquire & deploy
in 2015 at whatever scale
is needed (budget
permitting)
26. Still a hassle but no longer intractable
26
Storage
‣ Petabyte-capable storage is no
big deal in 2015
‣ Pricing slowly being
commoditized
‣ Many opportunities to do clever
stuff or waste phenomenal
amounts of money
‣ Biggest risk may be research
driving towards object storage
faster than Enterprise is willing
to commit/support
27. Hard but not insurmountable
27
Data Management
‣ Managing scientific data
at rest is still very hard
‣ … but we have seen a
few successful ways
forward
‣ DIY/RDBMS/LIMS
‣ iRODS
‣ Object Storage
30. 30
Issue #1
Current LAN/WAN stacks bad for emerging use case
Existing technology we’ve used for decades has been architected to
support many small network flows; not a single big data flow
31. 31
Issue #2
Ratio of LAN:WAN bandwidth is out of whack
We will need faster links to “outside” than most organizations have
anticipated or accounted for in long-term technology planning
32. 32
Issue #3
Core, Campus, Edge and “Top of Rack” bandwidth
Enterprise networking types can be *smug* about 10Gbps at the
network core. Boy are they in for a bad surprise.
33. 33
Issue #4
Bigger blast radius when stuff goes wrong
Compute & storage can be logically or physically contained to
minimize disruption/risk when Research does stupid things.
Networks, however, touch EVERYTHING EVERYWHERE. Major risk.
34. 34
What We Need:
- Ludicrous bandwidth @ network core
- Very fast (10-40Gbps) ToR, Edge, Campus links
- 1Gbps - 10Gbps connections to “outside”
- Switches/Routers/Firewalls that can support
small #s of very large data flows
36. 36
Issue #4
Social, trust & cultural issues
We lack the multi-year relationship and track record we’ve built with
facility, compute & storage teams. We are “strangers” to many WAN
and SecurityOps types
37. 37
Issue #5
Our “deep bench” of internal expertise is lacking
Research IT usually has very good “shadow IT” skills but we don’t
have homegrown experts in BGP, Firewalls, Dark Fiber, Routing etc.
39. 39
Issue #5
Cisco. Cisco. Cisco.
The elephant in the room. Cisco rarely 1st choice for greenfield efforts
in this space but Cisco shops often refuse to entertain any
alternatives. Massive existing install base & on-premise expertise
must be balanced, recognized & carefully handled.
40. 40
Issue #5
Firewalls, SecOps & Incumbent Vendors
Legacy security products supporting 10Gbps can cost $150,000+ and
still utterly fail to perform without heroic tuning & deep config magic.
Alternatives exist but massive institutional inertia to overcome.
Deeply Challenging Issue.
42. 42
‣ Peta-scale becoming the norm, not exception
‣ Compute is a commodity; Storage getting there
‣ Historically it has been pretty easy to integrate
“Research Computing” with “Enterprise”
facilities and operational standards
‣ We can no longer assume the majority of our
infrastructure will reside in a single datacenter
43. 43
‣ We need a massive increase in end-to-end
network connectivity & bandwidth
‣ … and kit that can handle large data flows
‣ Current state of “Enterprise” LAN/WAN
networking is not aligned with emerging needs:
‣ Cost, Capability, Performance, Security …
44. 44
‣ New hardware, reference architectures, best
practices and methods will be required
‣ There is no easy path forward …
46. 46
‣ Science DMZ
‣ Only viable reference architecture &
collection of operational practices /
philosophy BioTeam has seen to date
‣ In-use today. Real world. No BS.
‣ High level visibility & support within US.GOV,
grant funding agencies and supporters of
data intensive science and R&E networks
47. 47
‣ If you did not know why you were attending this
workshop today; hopefully you do now!
‣ Enjoy the rest of the talks!