This document provides an overview of Ian Rowson's presentation on selecting and implementing a Museum Collections Management System (CMS). Some key points:
- CMS projects involve significant time and resources, so it is important to minimize risks by following best practices. Rowson outlines seven "golden rules" to help with this.
- Choosing a flexible, standards-compliant system is important to allow for future changes and data exchange. Homegrown databases often fail to meet long-term needs.
- Ensuring you can export data in an open format is essential to avoid being locked into one system forever. Suppliers should demonstrate this capability.
- Getting support from various departments and an experienced supplier can help navigate technical
- The speaker observes trends in how research infrastructure is changing more rapidly than IT can refresh systems, creating challenges. This includes new instruments generating vastly more data.
- There is a blurring of roles between scientists, sysadmins, and programmers as everything becomes more automated and "scriptable." Sysadmins must learn programming and researchers can now self-provision resources.
- Virtualization is widely used even in HPC to provide flexibility and address business needs. Very large "fat node" servers are replacing clusters of smaller nodes. Local disk is coming back as a hedge against big data requirements.
- Object storage is becoming more viable and approachable on commodity hardware for a
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
This document discusses trends in bioinformatics infrastructure and IT from the 2016 BioIT World Conference. It notes that science is evolving faster than IT can refresh infrastructure and patterns. There is a trend toward DevOps, automation, and scripting skills being necessary for career mobility. Cloud computing and virtualization are becoming more widespread. Data lakes and Hadoop are also growing trends, though expertise is still needed. The document also discusses trends in computing, including the need for mobile analysis and common hardware for HPC and Hadoop. Storage trends include the rise of data, refresh of scale-out NAS, and new disruptive storage platforms.
This was a 30 min talk intended as one of the opening/overview presentations before a full-day deep dive into ScienceDMZ design patterns and architectures.
Direct downloads are not enabled. Contact me directly (chris@bioteam.net) if you for some odd reason want a copy of this slide deck!
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
This is a talk I put together for a http://www.neren.org/ seminar called "Bridging the Gap: Research Facilitation". Tried to give a biotech/pharma view for a mostly academic audience.
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
- The speaker observes trends in how research infrastructure is changing more rapidly than IT can refresh systems, creating challenges. This includes new instruments generating vastly more data.
- There is a blurring of roles between scientists, sysadmins, and programmers as everything becomes more automated and "scriptable." Sysadmins must learn programming and researchers can now self-provision resources.
- Virtualization is widely used even in HPC to provide flexibility and address business needs. Very large "fat node" servers are replacing clusters of smaller nodes. Local disk is coming back as a hedge against big data requirements.
- Object storage is becoming more viable and approachable on commodity hardware for a
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
This document discusses trends in bioinformatics infrastructure and IT from the 2016 BioIT World Conference. It notes that science is evolving faster than IT can refresh infrastructure and patterns. There is a trend toward DevOps, automation, and scripting skills being necessary for career mobility. Cloud computing and virtualization are becoming more widespread. Data lakes and Hadoop are also growing trends, though expertise is still needed. The document also discusses trends in computing, including the need for mobile analysis and common hardware for HPC and Hadoop. Storage trends include the rise of data, refresh of scale-out NAS, and new disruptive storage platforms.
This was a 30 min talk intended as one of the opening/overview presentations before a full-day deep dive into ScienceDMZ design patterns and architectures.
Direct downloads are not enabled. Contact me directly (chris@bioteam.net) if you for some odd reason want a copy of this slide deck!
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
This is a talk I put together for a http://www.neren.org/ seminar called "Bridging the Gap: Research Facilitation". Tried to give a biotech/pharma view for a mostly academic audience.
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
October 2013 "Beyond the Genome" presentation slides. Talk is mostly focused on issues around IaaS cloud usage for "Bio-IT" and life science informatics & scientific computing.
PDF SLIDES AVAILABLE DIRECTLY - PLEASE EMAIL "CHRIS@BIOTEAM.NET" FOR SLIDES
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
This document discusses strategies for document capture in enterprise content management (ECM) systems. It describes the importance of document capture and outlines three main approaches: centralized, distributed, and hybrid. In a centralized approach, all scanning and processing is done at a central location, but this has disadvantages like transport costs, latency, and dedicated staffing needs. Distributed capture allows scanning and indexing to occur wherever documents originate, addressing some of the issues with centralized models. The document analyzes factors to consider in building an effective document capture strategy.
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
This document discusses modernizing data warehouse architecture to handle changes in data and analytics needs. It argues that the traditional data warehouse approach of fully modeling data before use is untenable with today's data volumes and rates of change. Instead, it advocates for a layered architecture that separates data acquisition, management, and delivery into independent but coordinated systems. This allows each layer and component to change at its own pace and focuses on data access and usability rather than strict control and governance. The goal is to design systems that can adapt to changes in data and analytics uses over time rather than trying to plan and control everything up front.
Start Today: Digital Stewardship Communities & CollaborationsTrevor Owens
The increasingly digital records of our communities and our organizations require all of us to become digital stewardship and digital preservation practitioners. The challenge seems daunting but the good news is we don’t have to do it alone. A distributed network of practitioners and learners across the country are increasingly finding ways to learn together and share and pool their resources to tackle these challenges and provide enduring access to our digital heritage. Owens’ talk will provide examples of how archivists are rising to the challenge and practical guidance for both digital preservation beginners and experts.
Mapping Life Science Informatics to the CloudChris Dagdigian
This document discusses strategies for mapping informatics to the cloud. It provides 9 tips for doing so effectively. Tip 1 advises that high-performance computing and clouds require a new model where resources are dedicated to each application. Tip 2 recommends hybrid cloud approaches but cautions they are less usable than claimed and practical only sometimes. The document emphasizes the need to handle legacy codes in addition to new "big data" approaches.
The document discusses IBM's Big Data and analytics solutions, including Watson Explorer which provides a single interface to access both structured and unstructured data. It also outlines several common use cases for big data such as customer analytics, security intelligence, and operations analysis. The final section provides contact information for an IBM sales manager to discuss these big data solutions.
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
A sample of my book "Business unIntelligence - Insight and Innovation beyond Analytics and Big Data", published by Technics Publications, 2013.
Chapter 5 shows the evolution of the Data Warehouse architecture and provides a description of some aspects of a modern Information architecture.
The book can be ordered in hard and softcopy formats at http://bit.ly/BunI-TP1
Business unIntelligence - a Whistle Stop TourBarry Devlin
The old world of business intelligence is being transformed into a new biz-tech ecosystem. Analytics is forcing the recombination of operational and informational systems in a consistent and coherent IT environment for all business activities. Big data—despite the hype—introduces two very different types of information that transform how business processes interact with the external world. Together, these directions are driving a new BI, so different to its prior form that I call it “Business unIntelligence”. This session covers:
- Business drivers and results of the biz-tech ecosystem
- Modern conceptual and logical architectures for information, process and people
- Positioning of all forms of business analytic and big data
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
Business and IT are facing the challenge of getting real and urgent value from ever-expanding information sources. Building independent silos of big data analytics is no longer enough. True progress comes only by integrating data from traditional operational and informational sources with the new sources that are becoming available, whether from social media or interconnected machines.
In this April 2014 BrightTALK webinar, Dr. Barry Devlin describes the thinking, architecture, tools and methods needed to achieve a new joined-up, comprehensive data environment.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Chris Dagdigian provides practical tips for life science IT leadership based on his experience working in bioinformatics. Some key points include:
1) Cloud adoption in life sciences is driven by the need for flexible capabilities and collaboration rather than cost savings alone.
2) Common mistakes include lack of planning, bypassing security reviews, and forcing legacy patterns onto cloud infrastructure.
3) AWS is the leader in cloud capabilities but all providers oversimplify challenges in their marketing. Real-world requirements around networking, security and provisioning need to be considered.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
Josh Patterson is a principal solution architect who has worked with Hadoop at Cloudera and Tennessee Valley Authority. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It allows for consolidating mixed data types at low cost while keeping raw data always available. Hadoop uses commodity hardware and scales to petabytes without changes. Its distributed file system provides fault tolerance and replication while its processing engine handles all data types and scales processing.
The document discusses the rise of big data and how organizations can leverage it. It defines big data as data that cannot be analyzed with traditional tools due to its large volume, velocity, and variety. It describes how technological advances have led to more data being generated and collected from a variety of sources. The document advocates that organizations must find ways to analyze all this data to gain valuable insights that can improve decision making, customer experiences, and business strategies. It provides several examples of how companies in different industries have successfully used big data analytics.
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
October 2013 "Beyond the Genome" presentation slides. Talk is mostly focused on issues around IaaS cloud usage for "Bio-IT" and life science informatics & scientific computing.
PDF SLIDES AVAILABLE DIRECTLY - PLEASE EMAIL "CHRIS@BIOTEAM.NET" FOR SLIDES
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
This document discusses strategies for document capture in enterprise content management (ECM) systems. It describes the importance of document capture and outlines three main approaches: centralized, distributed, and hybrid. In a centralized approach, all scanning and processing is done at a central location, but this has disadvantages like transport costs, latency, and dedicated staffing needs. Distributed capture allows scanning and indexing to occur wherever documents originate, addressing some of the issues with centralized models. The document analyzes factors to consider in building an effective document capture strategy.
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
This document discusses modernizing data warehouse architecture to handle changes in data and analytics needs. It argues that the traditional data warehouse approach of fully modeling data before use is untenable with today's data volumes and rates of change. Instead, it advocates for a layered architecture that separates data acquisition, management, and delivery into independent but coordinated systems. This allows each layer and component to change at its own pace and focuses on data access and usability rather than strict control and governance. The goal is to design systems that can adapt to changes in data and analytics uses over time rather than trying to plan and control everything up front.
Start Today: Digital Stewardship Communities & CollaborationsTrevor Owens
The increasingly digital records of our communities and our organizations require all of us to become digital stewardship and digital preservation practitioners. The challenge seems daunting but the good news is we don’t have to do it alone. A distributed network of practitioners and learners across the country are increasingly finding ways to learn together and share and pool their resources to tackle these challenges and provide enduring access to our digital heritage. Owens’ talk will provide examples of how archivists are rising to the challenge and practical guidance for both digital preservation beginners and experts.
Mapping Life Science Informatics to the CloudChris Dagdigian
This document discusses strategies for mapping informatics to the cloud. It provides 9 tips for doing so effectively. Tip 1 advises that high-performance computing and clouds require a new model where resources are dedicated to each application. Tip 2 recommends hybrid cloud approaches but cautions they are less usable than claimed and practical only sometimes. The document emphasizes the need to handle legacy codes in addition to new "big data" approaches.
The document discusses IBM's Big Data and analytics solutions, including Watson Explorer which provides a single interface to access both structured and unstructured data. It also outlines several common use cases for big data such as customer analytics, security intelligence, and operations analysis. The final section provides contact information for an IBM sales manager to discuss these big data solutions.
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
A sample of my book "Business unIntelligence - Insight and Innovation beyond Analytics and Big Data", published by Technics Publications, 2013.
Chapter 5 shows the evolution of the Data Warehouse architecture and provides a description of some aspects of a modern Information architecture.
The book can be ordered in hard and softcopy formats at http://bit.ly/BunI-TP1
Business unIntelligence - a Whistle Stop TourBarry Devlin
The old world of business intelligence is being transformed into a new biz-tech ecosystem. Analytics is forcing the recombination of operational and informational systems in a consistent and coherent IT environment for all business activities. Big data—despite the hype—introduces two very different types of information that transform how business processes interact with the external world. Together, these directions are driving a new BI, so different to its prior form that I call it “Business unIntelligence”. This session covers:
- Business drivers and results of the biz-tech ecosystem
- Modern conceptual and logical architectures for information, process and people
- Positioning of all forms of business analytic and big data
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
Business and IT are facing the challenge of getting real and urgent value from ever-expanding information sources. Building independent silos of big data analytics is no longer enough. True progress comes only by integrating data from traditional operational and informational sources with the new sources that are becoming available, whether from social media or interconnected machines.
In this April 2014 BrightTALK webinar, Dr. Barry Devlin describes the thinking, architecture, tools and methods needed to achieve a new joined-up, comprehensive data environment.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Chris Dagdigian provides practical tips for life science IT leadership based on his experience working in bioinformatics. Some key points include:
1) Cloud adoption in life sciences is driven by the need for flexible capabilities and collaboration rather than cost savings alone.
2) Common mistakes include lack of planning, bypassing security reviews, and forcing legacy patterns onto cloud infrastructure.
3) AWS is the leader in cloud capabilities but all providers oversimplify challenges in their marketing. Real-world requirements around networking, security and provisioning need to be considered.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
Josh Patterson is a principal solution architect who has worked with Hadoop at Cloudera and Tennessee Valley Authority. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It allows for consolidating mixed data types at low cost while keeping raw data always available. Hadoop uses commodity hardware and scales to petabytes without changes. Its distributed file system provides fault tolerance and replication while its processing engine handles all data types and scales processing.
The document discusses the rise of big data and how organizations can leverage it. It defines big data as data that cannot be analyzed with traditional tools due to its large volume, velocity, and variety. It describes how technological advances have led to more data being generated and collected from a variety of sources. The document advocates that organizations must find ways to analyze all this data to gain valuable insights that can improve decision making, customer experiences, and business strategies. It provides several examples of how companies in different industries have successfully used big data analytics.
Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant”
Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy.
The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO.
This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.
Android is an open-source, Linux-based operating system designed primarily for smartphones and tablets. Initially created by Android Inc., which was later acquired by Google, Android was unveiled in 2007. It has the largest worldwide market share of any mobile operating system. Key aspects include being open-source, having a large developer community creating applications, and allowing device manufacturers to customize Android for their devices.
This document appears to be about flat plan designs created by Lauren Morgan. It seems to be advertising the services of Lauren Morgan, who creates flat plan designs under the name "My Flat Plan Designs." The document likely contains contact information for Lauren Morgan and details about the types of flat plan designs she creates.
The hybrids are coming: The Era of Touchscreen HybridsJohn Whalen
Interaction Design For Keyboard / Touchscreen Hybrids: How Your Designs Need To Change
John Whalen, UX Lead & Founder
Brilliant Experience
User Focus 2012 - UXPA-DC
Learn how interaction design is changing in the era of "tablet transformers" and "touchscreen laptops".
When do users click or touch? How do interaction designs need to change to provide a great user experience? Using some of the biggest sites on the web built here in Washington (e.g., Marriott, Living Social, USA Today) we will reveal the strengths and weaknesses of state-of-the-art designs.
In a live "UX cage match" volunteers from the audience will race to find the answer to questions using different sorts of devices (small tablet, tablet with keyboard, tablet transformer, laptop), demonstrating the unique benefits and constraints of each device type.
After that we will show clips from our research revealing how current designs fall short for users of touch/type hybrids. Based on the data we collected we will attempt to answer the key UX question: How are interaction design patterns changing and how will my site need to change to accommodate the next wave of devices?
This document summarizes a study of CEO succession events among the largest 100 U.S. corporations between 2005-2015. The study analyzed executives who were passed over for the CEO role ("succession losers") and their subsequent careers. It found that 74% of passed over executives left their companies, with 30% eventually becoming CEOs elsewhere. However, companies led by succession losers saw average stock price declines of 13% over 3 years, compared to gains for companies whose CEO selections remained unchanged. The findings suggest that boards generally identify the most qualified CEO candidates, though differences between internal and external hires complicate comparisons.
The evolution of the collections management systemirowson
The document summarizes the evolution of collections databases from the 1960s to present day. It discusses how early systems automated library card catalogs, how CMS systems emerged in museums driven by accountability needs, and how continual technology changes like personal computers and the internet impacted systems. It also explores current trends like APIs, cloud computing, and underutilization of CMS functionality despite advances.
Neil Perlin is an internationally recognized content consultant who helps clients create effective content across various mediums. The document discusses several predictions for the future of technical communication, including increased use of mobile-friendly responsive design, topic-based authoring, structured authoring using standardized styles, and analytics to track content usage. It also covers trends toward open web standards, cloud-based tools, and smaller chunks of reusable content.
IT Performance Management Handbook for CIOsVikram Ramesh
Learn why measuring performance on individual devices and systems often leaves admins flying blind when it comes to SLA management and identifying performance bottlenecks. This in-depth e-Guide talks about how VirtualWisdom4 can give administrators a live, up- to-the-second view across the system-wide IT infrastructure.
Solve User Problems: Data Architecture for Humansmark madsen
We are bombarded with stories of the latest products to hit the market – products that will change everything we do. This causes us to focus on the latest technology, building IT for the sake of building IT. Meanwhile, the world still seems to run on Excel.
The “big innovators” who have and use unimaginably large amounts of data are not the norm. Aspiring to use the same complex technologies and patterns they do leads to poor investments and tradeoffs. This is an age-old problem rooted in the over-emphasis of technology as the agent of change. Technology isn’t the answer – it’s the platform on which people build answers.
To emphasize technology is to ignore the way tools change people and practices. The design focus in our market was on storing and making data accessible. If we want to make progress then we need to step back from the details and look at data from the perspective of the organization. Our design focus shifts to people learning and applying new insights, asking questions about how an organization can be more resilient, more efficient, or faster to sense and respond to changing conditions.
In this talk you will learn how to put your data architecture into a human frame of reference. Drawing inspiration from the history of technology and urban planning, we will see that the services provided by the things we build are what drive success, not the latest shiny distraction.
Data centers are growing to accommodate more internet-connected devices, with innovations helping achieve network coverage for billions of devices by 2020. As data centers grow, trends like software-driven infrastructure, microtechnology, and alternative energy use are making data centers more efficient by consolidating resources and reducing size. Hyperconvergence allows more efficient use of rack space by consolidating computer storage, networking, and virtualization in compact 2U systems from companies like Simplivity and Nutanix.
Planning and Managing Digital Library & Archive Projectsac2182
The document provides an overview of a workshop on developing and managing digital library and archive projects. It includes the workshop schedule, introductions from attendees, strategies for success, managing born-digital assets and digitized content, infrastructure requirements, and considerations for digital preservation over the long-term.
This document discusses best practices for content delivery platforms to support artificial intelligence projects. It recommends that platforms (1) accept that they do not have all the data needed and should integrate third-party sources, (2) provide consistent tagging of content, (3) offer a lightweight programmatic interface, (4) embrace allowing large amounts of content to be taken offline for analysis, and (5) enable complex filtering and selection of data. The document also suggests platforms could consider offering preprocessed datasets or AI tools as new products.
This document discusses the importance and evolution of data modeling. It argues that data modeling is critical to all architecture disciplines, not just database development, as the data model provides common definitions and vocabulary. The document reviews the history of data management from the 1950s to today, noting how data modeling was originally used primarily for database development but now has broader applications. It discusses different types of data models for different purposes, and walks through traditional "top-down" and "bottom-up" approaches to using data models for database development. The overall message is that data modeling remains important but its uses and best practices have expanded beyond its original scope.
Capgemini Ron Tolido - the 3rd Platform and InsuranceEDGEteam
1) The document discusses digital transformation in the insurance industry and outlines several frameworks for how insurance companies can progress in their digital capabilities and mastery.
2) It presents different "levels" of digital capability that insurance companies may fall into, from "beginners" to "digital masters", and suggests that most insurers currently rank as "conservatives".
3) Several technology trends and drivers are introduced that can help insurance companies advance their digital transformation, such as social, mobile, analytics/big data and cloud computing. Combining these drivers is seen as particularly powerful.
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
This white paper elaborates the role of the enterprise search technology as an intelligent retrieval platform for structured data, a role traditionally held by the Relational Database Management Systems (RDBMS). Furthermore it investigates the great possibility by enterprise search solutions to derive insights and patterns by also analyzing the unstructured data, which is not possible to do with traditional data warehouse systems based on RDBMS.
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsInside Analysis
The Briefing Room with Mark Madsen and IBM
Live Webcast on Aug. 27, 2013
Visit: www.insideanalysis.com
What's old is often new again, especially in the world of information management. The innovation of OLAP cubes years ago transformed business intelligence by empowering analysts with significantly faster number-crunching capabilities. Today, with data volumes exploding, a new kind of cube is offering similar value, thanks in large part to in-memory analytics.
of The Briefing Room to learn from veteran Analyst and practitioner Mark Madsen of Third Nature, who will explain how this new wave of in-memory technology can give analysts a needed boost for dealing with the rising tide of data volumes and types. He'll be briefed by Chris McPherson of IBM Business Analytics, who will tout IBM Cognos Dynamic Cubes, which were specifically designed to let business users maintain the speed and agility they need for their analytical solutions.
The document summarizes the goals and components of the Artificial Technology Center and its Digital Library project. The Center aims to advance high-speed internet applications through research and development. Its Digital Library will integrate a physical library with web-based resources to provide new ways for users to access and organize multimedia information from the internet. The Digital Library will have several key software and hardware components, including a physical library space, a website for remote control and access, a query engine for storing and categorizing collected content, and a server to power the system. The goal is to create new commercially viable internet products and technologies through this innovative library environment.
Gerenral insurance Accounts IT and Investmentvijayk23x
The document provides an overview of topics that may be covered in accounting, IT and investment exams, including:
1. The exam questions will be split between investment, IT, accounting standards and ratios, and preparation of financial accounts.
2. IT topics include storage units, network types, protocols, programming languages, databases, data warehousing concepts like data marts, operational data stores, and dimensional modeling techniques like star and snowflake schemas.
3. Key concepts in machine learning, deep learning, big data, data lakes and artificial intelligence are also defined.
Information is at the heart of all architecture disciplines & why Conceptual ...Christopher Bradley
Information is at the heart of all of the architecture disciplines such as Business Architecture, Applications Architecture and Conceptual Data Modelling helps this.
Also, data modelling which helps inform this has been wrongly taught as being just for Database design in many Universities.
chris.bradley@dmadvisors.co.uk
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
This document discusses how Cloudera Enterprise Data Hub (EDH) can be used for advanced analytics. EDH allows users to perform diverse concurrent analytics on large datasets without moving the data. It includes tools for machine learning, graph analytics, search, and statistical analysis. EDH protects data through security features and system change tracking. The document argues that EDH is the only platform that can support all these analytics capabilities in a single, integrated system. It provides several examples of how advanced analytics on EDH have helped organizations like the government address important problems.
Essay on Database
Database Essay
Different Types of Databases Essay
Database Systems Essay
Essay Database
Database Design Essay examples
Database Administrators
Database Research Essay
Essay on Database design process
Essay on Databases
Databases Essay
Experience Probes for Exploring the Impact of Novel ProductsMike Kuniavsky
This presentation includes an overview of PARC, of Innovation Services at PARC and our use of social science, and a description of a process we use, experience probes, to reduce the risk of adopting novel technologies while still making breakthrough innovations.
Spectrum 16 pmc 16 - mobile and tech commNeil Perlin
Mobile technology is spreading beyond just phones and tablets and will significantly impact technical communication. To prepare, technical communicators should define the value they provide, consider new business models, focus on search-based navigation over indexes, write more concisely for mobile, and learn new skills like CSS. In the future, they may need to create true mobile apps, explore new interfaces like voice control, and personalize content based on user analytics. Mobile will change how technical communication is delivered and require adaptation.
1. The document discusses a new approach called the Cloud Analytics Reference Architecture that aims to better utilize big data.
2. It removes traditional constraints of data silos by consolidating all data in a "data lake" accessible for analysis.
3. This allows analysts to search for insights and patterns across all available data rather than being limited to specific predefined queries of individual data sets.
A presentation delivered in Sydney Australia on existing web technology and some of the newer emerging web technologies and how to use them in your business
Similar to Collections Databases; Making the system work for you (20)
Collections Databases; Making the system work for you
1. adlib
Collections Databases: Making The System Work For You
Ian Rowson
My presentation today offers an overview of the process of purchasing a Museum Collections
Management System (CMS), and along the way I’m going to identify seven ‘golden rules’ to help
minimise the risks inherent in this process. Museums come in all shapes and sizes, of course, and so it’s
difficult to give a presentation which is completely relevant to all. So I’m trying to pitch this more
toward the smaller institution, on the assumption that large institutions should, in theory, have more
skills and experience available in-house to help them. Hopefully there will be something here that is
relevant to anybody considering a CMS project.
I’ve been involved in the world of museum CMS for about 12 years, from both the museum side as a
software purchaser and sometime software builder, and also from the software supplier perspective in
my current role as Manager of Adlib UK. In this time, I’ve seen many projects come and go.
I hope it’s fair to say most have had a positive outcome, not that I claim any particular credit for that.
CMS projects are, or at least should be, very much team efforts. Such projects involve resources, mainly
time and money, which are much too valuable to waste. But as is usual in life, it is the projects that hit
problems needing to be resolved that provided the greater learning experiences.
What I hope you will retain from this presentation is my list of golden rules of CMS projects.
I make no claim that this list is exhaustive, or for that matter unbiased, but if you keep at least these
points in mind, they should help you to avoid the worst dangers.
To begin, I want to talk about a bit of background. Why do we need a CMS in today’s museum?
Most museums have some kind of database, even if it’s just an Excel spreadsheet, to record details of
their collection. It is after all, a requirement of the Museum Accreditation Scheme to make a catalogue,
and a database offers an attractive way of doing this.
The classic justification is of course that museums contain a lot of information. By improving access to
information many of the processes of management of collections can become streamlined.
1
2. adlib
But a good CMS is about recording more that just collections data, such as object names, makers and
dimensions.
What about all the other information resources the museum may hold?:
• Files about objects, artists or makers, which could include archive material.
• Research resulting from exhibitions or academic study
• Published Material pertaining to collections or objects
• Interpretive texts
• Texts written for educational resources or web presentation.
• Digital Assets – Images, sound recordings, digitised video or film
• Information about objects gained from visitors
I’m sure you can think of many other examples.
If you don’t attempt to manage these resources and bring them under control, their purpose and
usefulness to the museum is severely diminished. A museum can gain much from making the
information it holds accessible to all the different staff who work within it in a consistent and
comprehensive fashion, let alone, of course, making information available outside the institution.
At the moment, in my opinion, good old-fashioned information management does not have the profile
it needs to have in our profession. Everyone’s attention seems to be focussed on more exciting topics,
such as the possibilities offered by social networking on the web. Blogs and podcasts are today’s hot
topics of discussion, and while I’m not arguing against doing those things, what I would ask is; what
happens to the information you may gather? If you don’t store and manage it properly, then it will be
lost, which to my mind rather renders the whole exercise pointless.
The principles of good information management should underpin any museum project using ICT, but
I’ve seen too many instances where a designers have been engaged to create a website, or a gallery
interactive, where the content is completely inaccessible to be repurposed or sometimes even edited, by
the museum in any way.
2
3. adlib
So my first golden rule is: Avoid establishing ‘silos’ of unconnected data which are inaccessible and
therefore unusable for other purposes. When commissioning any software in the museum for whatever
function, the question you should ask is “How easily can we access the data in this system to use it in
other ways”?
The collections management system, which is designed specifically for the purpose of information
management, should be the natural home of all information resources in the museum.
Now I wish to address the main issues that may impact CMS projects. I suppose you could say, “But
what could go wrong?” After all, we’re not talking about rocket science here. We’re talking about
installing a computer system, running database software in a museum. Surely that’s not a problem?
Computers are easy now, even my kids can use them?
Well, it’s true that computers, or information technology, have become ubiquitous in society. Even at
home we are using things such as wireless broadband, that would have been unimaginable 20 years
ago. So obviously Information technology has become much more accessible, but computer systems
for professional use are still some way from being ‘consumer goods’ that you can just ‘fit and forget’.
In particular, there are some big issues which characterise Museum CMS projects which simply can’t be
overlooked.
Firstly, Museum information management requirements are quite complex. They can involve
different kinds of information resources (as I’ve highlighted). even object data can vary widely – for
example, a natural history specimen has very different information recording needs to a fine art
object, yet they often need to co-exist in the same database. If you take the view they can be in
separate databases, then already you’re on the way to creating silos – how do you search across
these databases? How do you create links between object records within different databases? How
do you ensure consistent vocabulary is used to describe them?
Secondly, Museums collect under the ethos that their objects are to be maintained ‘in perpetuity’.
To manage them effectively, that also means you need to retain the information about them ‘in
perpetuity’. Herein lies a challenge.
Computer systems used in a commercial environment have typical projected lifespan of about
4 years before they will be replaced. In fact, it could be argued that the whole ICT industry is
3
4. adlib
based on rapid change and obsolescence of its products. Horror stories about this have even
made it into the national press – which shows it must be true!
A notorious example of this was the BBC’s £2.5m Domesday project, which became obsolete
less than 20 years after it was created.
I’m sorry to say that this issue does not seem to have the profile it deserves within the museum
profession, and even some software vendors in the market today place little emphasis on this.
After all, what salesperson wants to raise the doubt in the customer’s mind of what happens to
their data, if and when they decide to stop using the expensive software they are just about to
invest in?
But it is a truth that, compared to the massive investment in staff time creating data resources,
the cost of any software application, (which after all, is only ever a temporary home for it) is
fairly insignificant. The pace of change is such that who knows what kind of computers systems
or software we will be using in ten or even twenty years time?
So I would argue that the most important criteria for software purchase today is to make sure
that it will allow us at some point in the future to extract our data without loss or huge expense.
However, it is by no means a given that all software applications will permit easy extraction of
data, and my company know this better than most, because we spend most of our waking
hours converting data from other systems into Adlib. That is why we pledge to all our
customers than you can and will be able to extract ALL your data from OUR software
applications as fully portable XML files.
So my second golden rule: make sure you can extract your data in a suitable format from any software
you purchase – in fact, ask to see this demonstrated to you if you have a software presentation.
I’m now going to talk a little about how to begin planning a project. This is a big subject, which is
rather beyond the scope of this presentation, but I can point your towards some good sources of advice,
such as Managing New Technology Projects in Museums and Galleries available from Collections Trust,
and the Museum Informatics Website publication Planning for Museum Automation. Both of these
publications, although a few years old now, have a lot of relevant material in them.
4
5. adlib
One thing you can be sure of: Projects will always expand to fit or exceed the time available!
An important factor to consider is that of confidence. As a member of museum staff, perhaps working
in the documentation or collections management dept., how often do you purchase a computer
system? Hopefully, no more than once perhaps, in any one organisation you may be working in.
Yet projects to purchase and install CMS typically require more than a passing level of technical
understanding. At very least, it is going to mean dealing with technical people, who quite often seem to
talk a different language to museum people.
At Adlib we recognise that this can lead to misunderstandings which can pose a risk to any project. We
seek to minimise these risks by employing people who are museum qualified and museum experienced
in our sales and consultancy roles. This means such staff can act as interpreters between our customers
and our technical people, the people who actually make things work.
This helps too, because sometimes museum staff are uncomfortable dealing with the ICT department
of their own organisation, be it a local authority or university, for example, let alone an external
company who are trying to sell you their product.
You are skating on thin ice if you are having to rely on a member of staff who has recently brought a
pc for the kids to do their homework on as being your expert in information systems procurement. But,
unfortunately, this is sometimes the case in a small museum.
So my third golden rule is: try to get some help. Don’t attempt to do it all on your own. Try to build a
project team that includes IT dept. people. If this is not possible, in any event, try to use a supplier that
can demonstrate good experience of and understanding of museum projects.
Having decided that some form of database system is required, and assembled a project team to
undertake the procurement of it, a next step is to consider how to proceed. There are two broad
possibilities:
If someone in the museum is quite IT literate, you might consider a DIY approach. A simple museum
database can be built with software such as Microsoft Access, which the museum may already own.
5
6. adlib
This could seem a tempting possibility, particularly if budgets are tight, or you have a member of staff
who fancies themselves as a systems developer.
Alternatively, and assuming you have your own money printing equipment, you could employ a
contract software developer to build a system for you.
There are issues that you need to be aware of, however, if you consider going down the DIY route.
Firstly, the issue of time. To design and construct a database application takes time. Lots of it. Can you
spare the member of staff from their usual duties to do this? What other work will become neglected
while the project proceeds? It is amazing just how much time such a project can soak up.
Secondly, can you be sure of a good outcome? If a member of staff’s time is invested in this for weeks or
even months, what if they get out of their depth and can’t finish?
Thirdly, what always happens is your resident expert leaves to go to another job. If that happens, is
anyone else left who understands or is capable of maintaining the system?
At Adlib, we spend a lot of our time converting data from home-grown database projects that have
either hit one of the problems mentioned, or simply reached the point where the system simply cannot
offer functions that the museum needs.
Adlib software has been around since 1970’s which means that dozens of man-years (and woman-
years) has gone into defining and refining its functionality. This means it can do things that would be
impractical to develop in-house.
My belief, and I can speak from experience as someone who has in the past been a museum’s resident
Access developer, is that at best using software such as Access offers a solution which is only really
applicable if the requirement is basic, and budget is non-existent.
Should your project come into this category, I would instead recommend that you consider using Adlib
Museum Lite, our free basic catalogue software package which is available for download from our web
site. By using this, you could save yourself a lot of time and trouble. In the future, the data you enter
6
7. adlib
into to Museum Lite can always be transferred into more powerful software should the need arise. If
you buy one of our products, we will do this conversion for free.
The most common way of proceeding with a CMS project is to purchase a commercially made software
package for museum management, of which Adlib is one vendor of about a dozen currently operating
in the UK market.
So how can you go about choosing which one is right for you?
Well, firstly the processes of museum collections documentation are well defined by SPECTRUM
standard. So there is no real need to re-invent that particular wheel in trying to write a functional
software specification. It’s already there.
The Collections Trust administer a compliance programme, where software suppliers can have their
system validated against the SPECTRUM standard. Currently, the Collections Trust website lists 5
compliant systems, of which Adlib is one.
I would always recommend adherence to standards, for a variety of reasons, but not least because to do
so makes your data more easily ‘interoperable’, meaning you could more easily exchange it with
another institution if you so wish, or dare I say, when the time comes to more it from one Collections
Management System to another.
I spoke earlier on about the CMS being a natural home for museum data. Well, Spectrum defines
museum data clearly enough, but what about the other resources I mentioned earlier? You may have
archival or library data, or digital resources relating to your collections?
There are other standards that apply in these cases; for example, ISAD(G) for archival data, and AACR2
for library data. Dublin Core is often employed for Digital Asset metadata. You probably won’t be too
surprised to hear that Adlib follow these standards in the implementation of archival, library and
digital asset catalogue modules which may be integrated with the museum system to allow cross
searching, linkage of different material types and use of common terminology for cataloguing.
So my fourth golden rule is ‘Standards do matter’. They help to keep data ‘open’ for exchange and
movement, and sometimes it is even a requirement of grant giving bodies that they are adhered to.
7
8. adlib
Standards are all well and good, you may say, but what about my special requirement to do ‘X’? Our
collection is unique, and we have a long tradition of recording such and such information that is not
part of Spectrum?
This argument is sometimes used to win support for ‘home grown development’, that being that ‘no
commercially available package could ever meet our needs’.
While it is true that museum software systems by their nature have to be designed to support the
operation of a ‘generic’ museum, if such a place exists, the software package should also be flexible
enough to be configured to cope with any special requirements that may crop up in future, without
having to go down the route of bespoke software development. After all, who knows what is around
the corner?
To give an example, imagine you decide to embark on a project to use volunteers to enter the content
of several hundred mda cards onto your database. Processes such as this can be greatly simplified if you
can create your own screen layouts. Adlib comes delivered with a tool called Designer which you can
use to do exactly that. In fact you can do a lot more than that. You can also create new fields and
indexes, change screen texts and colours, in fact build yourself a whole new specific database if you
wish. We offer a range of training courses to customers who wish to learn how to do this kind of thing
themselves, or, of course, you can commission such work from us.
So my fifth golden rule is ‘flexibility is important’. No-one wants to be locked-in to working a certain
way, just because the software you’re using dictates that. How easy is it to change things?
Now, having decided to buy a standards compliant, flexible system, how else do you choose?
Well there a couple of factors you may want to take into consideration;
Firstly, you’re not just buying software. Inevitably, you are in fact buying into a long term relationship
with a supplier, of which the purchase of a system is only the initial step. I’m talking here about
ongoing software support.
8
9. adlib
Software is something that is continually being upgraded and bugs fixed. It’s a bit like when you buy a
car, you need to have it regularly serviced to keep it running sweetly, but more complicated than that,
because when you update software, you usually also get new features added to it, but these new
features may also bring new bugs along with them, and so the process goes on, it is just a fact of life.
So you need to be sure that the software supplier is capable of delivering an appropriate level of after-
sales support, sometimes termed a ‘service level agreement’ and this you can typically judge by:
1) the scale of the organisation: do they have technical people in the UK who are available when you
need them on the phone, or do you have to wait until they have woken up, because their office is on
the other side of the world?
2) talk to people who are currently using the system, and get feedback from them about the after sales
service provided. Any supplier worth their salt will provide names of similar institutions using their
products that you can contact.
Learn about the supplier by gathering information. Obviously events such as the Museum and Heritage
show provide an ideal format for this, because all of the major suppliers are here for you to talk to and
collect sales literature from.
However, although sales literature is useful, it is only the ‘gloss paintwork’. What you need to do is
scratch through the surface, ideally by getting hold of a demonstration copy of the software so you can
give it a thorough trial run, and take a read through the user manuals.
At Adlib, we have a very open policy which means that demonstration copies of our software and all
our manuals are freely available for download from our website. If you find that a potential supplier is
cagey about letting you have access to this kind of information, then I think you really have to question
why that is so?
So my sixth golden rule is; select a supplier who is able to provide a professional level of service if you
encounter a problem, or should I say ‘when’. No-one likes to be waiting around for time-zones to
change before they can talk to the helpdesk, or for a call back from someone who is not really
technically qualified to be doing software support work.
9
10. adlib
So far, so good. But things may get more complicated if you have to begin a competitive tender
process, which many organisations require for projects that have a budget that exceeds a given
amount.
By it’s very nature, creating a tender document does require a certain amount of re-invention of wheels
– both on the part of the prospective purchaser, in that functional and service level specifications need
to be written, and prospective suppliers, who have to respond to them.
My comment about tender documents is, if you have to use them, make sure you allow enough time
for the process to work properly. You’ll need sufficient time to create the documents, and suppliers
need to be given a realistic length of time to respond to them properly. You’ll also need time to
adequately evaluate and compare the responses.
I’m not going to say any more about tenders, because if you need to use them, the chances are there
will be a procurement department within your organisation who will guide and advise you on this.
Having selected a supplier, its time to discuss project implementation – in other words, see about
getting it all installed and working. This is where the fun really begins!
If you have data to be migrated to the new system, make sure you allow enough time and money to
complete this properly. I think it’s fairly safe to say that the success of most projects stands or falls on
the data conversion.
This is why we have evolved a three stage process, which I’m going to explain because I’ve found that
many people coming into this for the first time seem to think it is a trivial exercise. Far from it. It cannot
be rushed, and don’t let anyone tell you that it can.
Firstly, we create a mapping document which outlines for every field of data in the old system,
where it will end up in the new one. This is then discussed and signed off by the customer.
Secondly, we transfer the data according to the rules defined in the mapping document, and
provide it to the customer for checking as a test installation. This gives the opportunity to make
changes if anything has not worked out quite right.
10
11. adlib
Thirdly, we request a final copy of the data, and transfer it to the new ‘live’ system. This means
that there is plenty of opportunity for checking and correcting any errors before the final switch
over, and that ‘downtime’ is kept to a minimum.
The delivery of the test system also provides an opportunity to test and prove that the technical
infrastructure for the new system is all ok before it goes live.
So my seventh golden rule is; don’t try to rush project implementation, especially any data conversion.
Build in plenty of staff time for data checking, because once you’ve fully switched over to a new system
for data entry, it can be very difficult, if not impossible, to re-run the data conversion again.
My final word on the subject is, keep in contact with your supplier. Join the user group. Participate in
any discussion list. Let them know what you think about their products, what features are good, and
what you think needs changing. The relationship between the software supplier and the customer
should ideally be one of mutual benefit.
References:
Perkins, John.(1993) Planning for Museum Automation Student Workbook, Pittsburgh: Archives &
Museum Informatics,. Available online at http://www.archimuse.com/publishing/automation.html
Stiff, Matthew (2001) Managing New Technology Projects in Museums & Galleries, Cambridge, MDA
11