1. An Association for All IT Architects
Complexity and Robustness
in Healthcare Systems
Princeton Digital Advisors
Who’s leading your digital
a) Your CIO
b) Your CTO
2. A first look at the situation of the COVID19 outbreak
How customer behaviors have changed and patient-centricity
How do complex, adaptive systems respond?
And a look down the road
What happens after COVID – has the practice of medicine changed?
Is the customer experience a similar quality?
What do we take away as application architects?
3. 1M+ nCOV-19 cases out of a population of 330M, not actually sure how many others may have had it or even have died from it
Air cargo down 27% in March; truckload cargo actually up
Major universities go entirely online; University of Chicago students protest full tuition payment for lower quality online
Movie theaters closed; 38% surge in Netflix usage (and Hulu…)
Airbnb down 96% in Beijing, 50% in US... changing refund policy for thousands of owners; airlines receive bailout and must fly
Energy sector gas spot price to sub-$40/barrel, down 50% YTD; can store North Sea but not in Oklahoma
One third of rental payments in first week of April were not made; refinances on home mortgages accelerating, potentially with
best rates in a generation
Restaurants revenue, normally $870b per year, down 35% in first week of March; 200% growth in online grocery app downloads,
45% growth in online grocery and Walmart orders to curbside
35% NAM responses indicated supply chain was causing delays in manufacturing; record breaking US output at 2.4 trillion
dollars in goods produced in Q4 2019 (just a few months earlier)
GDP prediction from 2% growth in 2020 to a 2-6% contraction. Government response has included sector bailouts, relaxing
scheduled payments; industry extends contracts, skip a payment. What do layoffs mean to disposable cash purchases?
Stock markets in US down 16%, back up 8% already – U- or V-shaped recovery?
Do cruise lines have a future?
Some recent facts
4. Who’s giving the guidance?
Inter-organization cooperation: command & control (recommendations versus directives, legal and process
agreement on the stockpile, intra-org comms between call center and hospital and HR), data sharing, supply
chain/logistics – testing? Ventilators? Morgues?, decision making and trials (assessment)
Collaboration technology – patient-to-family, patient-to caregiver/provider (anxiety, correctness or delay in
getting results means increased effects of coronavirus), rumors… or contrary info (or agitators like people
who cough on produce at the store or Zoombombers)
Very distributed facilities
Change in scale to usage, and geographic diversity
Physical facilities needed to be reconfigured or were obsolete
Experience and expectations changing
Technology (and healthcare) reach to all the affected population, not just current customers
Previous experience provided in-person contact as reassurance, high-value add (maybe not so much)
Is security an ASR?
Internal and external business model challenges
Cross-subsidies, referral & billing systems brittle
Not just healthcare workers on the front lines… postal workers, fire/police, business owners, social services,
inmates, public press, nursing homes, manufacturers!
Challenges in healthcare
5. Provider (hospital, clinic, non-profits
esp. in mental health, pharmacy)
EMR, specialty (radiology PACS), quality
of care, scheduling and operations, supply
chain, reporting, HIPAA training &
Payer (insurers, benefits managers,
etc.) benefits transactions, reporting,
Patient (self, family, community,
employer) portals, ancestry.com
Clearinghouses & research databases (Universities
National and international organizations (Medicare,
CDC, WHO, county health departments)
Healthcare is distributed
6. US Army FM 5-19 – Composite Risk Management
Parallel planning from strategic to tactical
Identify risks, assess, develop controls, implement, supervise & evaluate
Define business model, identify uncertainties, assess, design, execute
Traditional risk management
Probability * Impact = Evaluated Risk Priority
4-stage Crisis Management
How do organizations deal with a crisis
BUSINESS CONTINUITY PLANNING
7. ONE APPROACH – “WAR ROOM”
Evaluate which parts “go forward” and then fix gaps
Looks backwards first by looking at projects in flight (bias
towards learned experiences)
Low risk (consensus based with current participants) but
also assumes threats are something we can identify and
More on decision making in crisis
ANOTHER APPROACH –
Take a broad look at how the environment
changed and sense trends before envisioning
the new business model with customers
Looks forward to green shoots internally as
well as lunatic fringe externally, but runs the
risk of untrusted advisors, jeopardizing some
short-term business, and not being able to
transform in time
Can get out of a local minimum for unknown
8. UNPREPARED, WING IT
On a HIMSS call, a German doctor indicated each State was having
to build their own infrastructure and reporting on COVID, which really
shows how we've traditionally done healthcare as doctor/institution-
focused doesn't make sense when medicine is becoming
GREEN SHOOTS ACCELERATED
WiFi deployed to new hospital areas, Teams and Zoom in high
demand but facing hackers changing data and Zoombombing
Telemedicine moved from 1% to 99% but ditched video
Some local health departments had better plans based on similar
READY FOR NEW NORMAL
South Korea already had a mobile app installed broadly to do contact
What do we observe in healthcare?
The future is already here. It's
just not evenly distributed yet.
~ William Gibson
9. Can’t build a plan ahead of time (this was how many
looked at business continuity)
Destructive competition – prior suppliers, friction in
engaging volunteers (read: organic adaptation) in the
organization, competing organizations
Unknown public policy, governance – not a simple opt-in
Communications is hard!
Systems do not adapt, they only scale as one part of
Linear design process based on decomposition implies
systems can only change so quickly once they are in
Why this doesn’t work for VUCA events
10. New business models take shape
as we cross the chasm
Customer behavior is changing,
unlikely to revert fully
Customer-centric advances as a
We made some half-steps with patient-
centric, consumer-centric, student-
Now we have to go further…
What is the difference between a
COVID response and digital
Can we work backwards from a new normal?
11. Characteristics of complex adaptive systems
Path dependent: Systems tend to be sensitive to their initial
conditions. The same force might affect systems differently.
Systems have a history: The future behavior of a system depends
on its initial starting point and subsequent history.
Non-linearity: React disproportionately to environmental
perturbations. Outcomes differ from those of simple systems.
Emergence: Each system's internal dynamics affect its ability to
change in a manner that might be quite different from other systems.
Irreducible: Irreversible process transformations cannot be reduced
back to its original state.
Adaptive/Adaptability: Systems that are simultaneously ordered and
disordered are more adaptable and resilient.
Operates between order and chaos: Adaptive tension emerges from
the energy differential between the system and its environment.
Self-organizing: Systems are composed of interdependency,
interactions of its parts, and diversity in the system
 – Turner & Baker
What do complex systems tell us?
Functional requirements model may be only part of
what we need to be successful
They are often “discoverable” and have predictable
paths to implementation (“well worn path” or process
We already use the term non-functional, quality, or
ASR’s even to catch all the other stuff, some of
which we know as important, but which are much
more difficult to describe
Continuous adaptation as a means to survival
Similar to Agile’s refactoring as a way to avoid “legacy
debt” and having to have requirements mostly identified
ahead of time
Similar to the business cycle for Kodak and Fuji
Can systems adapt and survive better in the face of
12. “The fitness of the individual is the
probability that the individual will be
included among the group selected
as parents of the next generation”
13. Agile design
Epics / RT / roadmap
Sprint planning issues
How to bake in architecture spikes
How to balance functional &
How to measure value to the
organization in the sequencing
SAFe tries to “scale-up” agile to
deal with inter-project coordination
Continued discussion of what to do
with legacy debt and “bit rot”
Initial architectural guidance around
Verification through “fitness
functions” (measurements) from CD
to test to DevOps
Guidance should exist across
projects and align to organizational
Robust architecture issues
How to construct a fitness
function for certain NFRs
Continued lack of inter-project
Difficulty in measuring / testing
in quality (stressor known
ahead of time)
Antifragile/ adaptable design
Develop initial architecture
Describe information flows
List large number of “stressors”
Calculate “residues” – what survives,
regardless of likelihood of stressor
Redesign to increase residue size
Repeat residue analysis under broader circle
of stressors, possibly composing and
Continue to evolve architecture across
Antifragile architecture provides
Improved way to look at unlikely events
and ones which are non-linear
Views new system as part of an
Dynamic event response pulled into
design phase – no need to wait until
production to have a plan (or rearchitect)
Delays decomposition of system and
traditional risk planning until after design
systemsantifragilefor with in mind
14. Some technologies we’ve piloted may become mainstream (new normal).
Low cost, apply across new parts of the portfolio:
RPA value in making paper or disconnected processes completely
digital (HR hiring / staff planning integrated with LinkedIn, appointment
AI & Bots customer experience in giving specific, latest advice (over
static content), advice in real-time to first responders, personal advisors
Remote video & VR “We’ve talked about it for 20 years”
Digital twin (simulation) predictive modeling, contact tracing
Interoperability standards & data sharing (health “commons”) feeding
real-time decision-making dashboards
ML discovering deep data patterns for antivirus effectiveness and
contact tracing, what about unstructured OT data?
Easy collaboration (hangouts, teams, etc.) coordination of care, triage
Technologies to help us adapt
15. National (not whole Earth?) level
Organization level (CEO, CMIO, CTO,
1. Identify the new digital
2. Focus on stabilizing short term
3. Only lastly, assume existing
projects need to stay on track
How do I get back to work? How can I
mentally get back to focus when I’m
worried about this impact?
How do I stay safe interacting with
others? Are workplaces physically
How do I deal with personal impact
from this (either COVID or other things
which have happened in same period
How does my profession change?
How will we emerge from this?
Health architecture challenges
Geo-quarantine, with travel bans
By demographic, older people
Large gathering restrictions
Test and release, flare-ups, some sort of
ID showing clear? Some sort of antibody
test? Some sort of immunity for how
People have resisted stay at home, will
get worse as pandemic walking outdoors
Police intervention? Socialized policy
timetable...Tracking by mobile phone,
specific hall passes for certain types of
Some patients who have been on
ventilators have cognitive issues when
back (post ICU syndrome)
* This is my conjecture only
16. How can we get back to work?
Requirements for a scalable system (millions of workers),
demonstrable proof of being virus free, plus an extra
requirement… what if there is a resurgence of cases?
The question you’ve all been waiting for…
• Define the
• Build a mobile app
• Roll it out
• Define the
• Build a mobile app
• Get patient profile
from central SOR
• Automate some parts
for efficiency (don’t
ask are you Male, if
you already know…)
• Build a mobile app
• Provide a feedback
loop from hospitals (if
the app said I was
clear, and I wasn’t…)
• Design for preplanned
• Re-roll it out!
• Design with residual
analysis based on
and small MVPs to
see what survives
build parts that help
stressors observed in
• Set the path for
• Implement broader
triage mechanism to
detect if a new
disease is present
17. You guessed it…
Build a habit around this
Specifically be looking further afield, doing small $ MVP’s, and
incorporating anti-fragile planning as a co-equal in design
The uncomfortable questions are the ones which, when answered,
lead us to adaptability
What mindset do we need to adapt?
18. APQC guide on COVID preparedness
Customer journey mapping in healthcare example
Royal Society on disease modelling and surveillance
U.S. CDC site, JHU multi-sourced reporting site, Microsoft
WaPo visualization, Moscow Times, IHME predictive model, other data sets and here
Bill Gates 2015 talk & how Teams/remote meetings change things
IBIS outlooks by industry & CO, social distancing impact
Testing resources – BEI, NIH, Verily, CVS, Everly and strategy, Conduent
Computational biology labs & ML
HK/Singapore response, TraceTogether, BlueTrace.io
Residual design & antifragile(Barry O’Reilly)
For further reading
19. An Association for All IT Architects
For further information, please direct questions to email@example.com or
We’ll cover along the way:
Typical health IT systems (primer) across provider, payer, patient
DT has changed during the pandemic, more accurately accelerated – but not along the plan we had before – towards patient centricity
What is the difference between traditional design and design for robustness and design for antifragile?
One form of digital transformation would be to eliminate or reduce in-person experiences ~ making remote on par
Let’s review where we are as of 30 April. A definite sense of change and a lot of questions going forward. Business and technology leaders are looking around for guidance.
The next level down beyond statistics is understanding the patterns underneath. We know, for instance that…
Businesses are starting from different spots – some industries had “pre-existing conditions” which were trends indicating that current business model was unsustainable.
Some businesses had experiments, green shoots which could be accelerated (a plan that could be put into action) – just had not been taken seriously
Some businesses have not done strategic planning and may not make it with us in the same shape. (McKinsey puts this as 18% will thrive on new business model, about same number will cease to exist)
NAM – Conference Board
What specifically is going on in healthcare? We’ve seen a lot of these challenges in the news and with our colleagues.
Let’s take a look at what systems we’re talking about, just as a refresher.
Healthcare systems are both a complex portfolio of data flows and also distributed across organizations; we will call this a complex system. Causes us as architects to look at our organizations systems in the context of changes to our partners as well. System of systems. How do organizations under stress maintain continuity when certain systems and orgs are under stress – does the whole ecosystem collapse?
https://royalsocietypublishing.org/doi/full/10.1098/rstb.2018.0276, image from Royal Soc. May 2019, Outbreak analytics: a developing data science for informing the response to emerging pathogens
These are traditional, linear processes for emergency operations. Our business continuity planning takes a lot of preparation for known scenarios (natural disasters) and does not function well when we have a new type of occurrence (or ones we’ve only seen in particular domains at a limited impact). Most Universities for instance had NO plan to go online. Ask yourself, how good of a solution can you come up with on the fly when you do not have even a similar plan? This talk is about how to ideate better.
Army - https://www.globalsecurity.org/military/library/policy/army/fm/5-19/fm5-19.pdf
Gartner - https://www.securitymagazine.com/articles/91988-gartner-a-5-phase-approach-for-resilient-business-continuity-models-during-coronavirus-disruptions
Traditional - https://www.mitre.org/publications/systems-engineering-guide/acquisition-systems-engineering/risk-management/risk-impact-assessment-and-prioritization
In healthcare specifically, we are used to fall-back, written manual procedures if WiFi goes out… a very fragile solution designed for short-term BC (assumes we have time to fix/redevelop or condition will go away quickly).
Giving the two possible ends of the spectrum – one where we get information flow from our existing sources, and assume the ideas we had previously may need to be tweaked; the other where we start with the premise that even what we thought of the environment was wrong and we missed some signals, and need to look broadly for new inspiration before continuing ANYTHING from before the event
Left side is ego-driven, top-down, assumes control; right side is team-driven/internally honest about limitations
Left side uses existing resources to plan way out; right side is open to external advisors (outside experiences, often further afield or green shoots), requestions assumptions (in case they’ve changed)
Most orgs lean towards the model on the left – risk averse going into crisis.
https://www.eadirections.com/2020/04/leaders-your-journey-through-covid-19/ has some good advice on looking at these two models (graphic courtesy of EA Directions.
We see the same patterns in healthcare IT systems as we see at the macro level. This shows a view of different orgs being prepared at different levels
We always say Don’t let a good crisis go to waste…? – what do we learn from this… Huge gap in looking around for what was succeeding and making it our own.
Telemedicine may “stick” and be preferred over in-person - https://www.linkedin.com/pulse/covid-19-needed-telemedicine-finally-go-mainstream-mesk%C3%B3-md-phd/
Elective surgery used to fund the ICU, what if ICU is 100% of business? Not funded correctly if that’s the case
Delivery or packages on the curb over brick & mortar
Are we ready for engaged experiences like technology consulting over Zoom?
This is n organizational and systemic issue. A strategy that only works for my group ignores those nonlinear inputs from other orgs. A strategy for architecture that assumes my system is in a vacuum, ignores the fact that parts supplied from outside my control may be required and cause me to fail, even if I would scale normally.
Image - https://www.cultofmac.com/702095/apple-and-googles-contact-tracing-api-could-be-welcomed-by-the-european-union/
We see part of the future now, but not all of it. This is therefore difficult to do traditional gap analysis with (no complete future state, no bridging patterns)
Next, let’s look at complex systems starting with some theory and orientation.
Complexity allows us to think of systems of systems which do not have linear responses to each other (they change around us) based on environmental stimuli
Radical change of operations on continuum
5% telemedicine to 100%
Facilities unusable – “meet me out back”
References on complex systems:
Avancier - http://grahamberrisford.com/AM%204%20System%20theory/Sense%20and%20nonsense%20in%20system%20theory.htm
Evolutionary architecture - https://books.google.com/books/about/Building_Evolutionary_Architectures.html?id=qYI2DwAAQBAJ&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false
In evolutionary architecture, or “adaptive” architecture, we can use a fitness function in stead of a code review, to assess how well the design undergoes change and meets the needs of the system requirements in ASR areas. This lets us go beyond microservices as a universal goodness, to understand how changes to coupling, module design and decomposition, and platform layers & boundaries affect the overall –ilities of the system. When we say replacing code reviews, we are changing from an experiential model (senior developer to junior developer) to establishing the “rules” for how decomposition will be accomplished. The developer, the DevOps engineer, and the product owner should be able to interpret these in their work product. The challenge for the architect is then how to measure the functions over time for any given significant requirement (or quality attribute).
Architecture of systems came from a desire to control the outcome in a creative process – new methods reexamine this premise. Agile design moves to multi-skilled individuals working closely together to deliver faster than with specialized, distributed teams. We do not have a way yet to consistently deal with systems of systems or, in this case, a way to build resilience into systems. This again comes from a tension between the desire to control outcomes and future-proof systems. Looking at how agile plays out, we have a couple minor changes like using microservices instead of N-tier which offer the potential for some gains in certain attributes like performance or throughput but fundamentally we do a little architecture all the way through the process in an iterative way (not much different to waterfall). Agile makes a clear distinction between the short-term project and longer-term goals (architecture) by role and artefacts, usually prioritizing the project over these design aspects which become more important over time.
Agile is fundamentally evolutionary design, when compared to waterfall. We acknowledge that we do not know the details of all the tasks to be done at the start of the effort, but prioritize regularly to zero in on the final work product. It is very developer led, and not very systems-of-systems aware.
Evolutionary design adapts architecture to be able to consider stressors known ahead of time in the design – building the resilience in for certain very difficult to measure dimensions ahead of time. This creates initial guidance in these areas – guidelines for the team to measure against (with fitness functions) as the project proceeds.
Antifragile design takes this a step further to build in resilience for changes in dimensions we may not anticipate needing ahead of time. This extends the theory we had in ATAM/FMEA. The residue is a part of a system which is left after a stressor changes.
A true call to action to design differently
Microservices work sometimes… no different than Open Source
Look for decoupling and ability to have failure in a component (which redundancy does not mask)
Challenge to the architect is how you get these to the whole population and not worsen the experience (digital divide, ethical issues, other conversations we’ve avoided for a long time in systems resilience)
How do we leverage technology green shoots?
NORA Bot - https://www.nordea.com/en/press-and-news/news-and-press-releases/news-en/2018/how-our-robots-are-helping-customers.html
Conversa integrated healthtool and chats - https://conversahealth.com/
Teladoc - https://www.wsj.com/articles/teladocs-remote-doctor-visits-surge-in-coronavirus-crisis-11586894400?shareToken=st9f082c3427fa484897aae5c8d099f133&reflink=share_mobilewebshare
Avizia/AmWell - https://business.amwell.com/solution-overview/ & mobile app - https://amwell.com/cm/how-it-works/
Digital twin - https://medicalfuturist.com/digital-twin-and-the-promise-of-personalized-medicine/
https://medicalfuturist.com/ten-ways-technology-changing-healthcare/ for more
AI – LifePod - https://lifepod.com/
We answer this at least at 3 different levels of thought… all of which are interdependent. For example, when would you consider it safe to go to a restaurant with your friend. Depends on your situation certainly, what the restaurant might have done (Papa Johns advertisement that no one touches the pizza with bare hands after it leaves the oven), and what national guidance might be. Would you wear a mask? Would you do that if the hospitals were full or if there were empty beds? Context matters in organic systems – what if there were a preventive shot you could take to inoculate you?
Let’s take a personal example: an ER visit. If we had a system to represent this, it might be some sort of facilities, provider and patient record system. The stressors during the return to work period are dynamically changing. What if we see a rise in work absences from ambulance drivers, or an ineffectiveness in initial testing in home, or a reported process or equipment failure (which ends up to not be true), or a hacked EMR system which routes billing to a non-existent account? We may have more time than in the original crisis to spend in design, and we may have learned lessons from that previous period, so that we can actually start buying down the overall risk level (unknowns) for the next version – making it more adaptable. This broadening of stressors beyond normal modeling will be continued in later versions and will also help make our system more robust. It is part of the path then, to getting to a next stable plateau, a new normal.
National level – super-organizations
National or trans-national healthcare systems (interoperability, democratized access similar to Ryan White?)
What does the new customer journey look like?
make the lessons learned in the crisis become the new digital normal (err on the side of pushing the envelope towards the new model); which things did we (or peers) have in flight that worked extremely well?
look to the transformation in our own org which made it more efficient during the crisis – what did we learn?
Cash flow – what do I depend on to keep the old business running (suppliers, interactions, etc. from BMC); operations can take a back seat and fill in between #1 strategy meetings; the value add in continuing these is limited as customers and partners are making decisions separately
Projects - (many may be evaluated in light of 1 or 2) – some customers may be recovering themselves and no longer be interested/prioritizing these, or the projects may need re-validation of value proposition – may be obsolete work; previous projects selected as best in breed may not have same $$ value, and ones prioritized for lowest cost may have changed too
Business flow changes (HR, finance, sales, creating products & services, R&D/innovation)
How do I identify which talent should be shed? How do I rebuild talent? How do we communicate to all of our staff and suppliers?
Is testing a “benefit” I would offer?
How do I protect the communities we work in?
Should I offer deals to customers who are struggling? Postpone payments or create incentives which might be free or divert cash flow to NGO’s?
HR work remotely… https://youtube-creators.googleblog.com/2020/03/protecting-our-extended-workforce-and.html
Two things going on at once in a VUCA event: the business changing dynamically, and the system we’re trying to get out is not stable
What experiments do we need to do to answer this question with a technology-based solution? For example, we have the town of Vo’ in Italy, who has been isolated and is fairly homogeneous, to know the time duration of a contained outbreak and could test an “immunity passport” within the rest of Italy.
Image courtesy of NYT and WSJ (mobile app is a band plus phone app being tested in S Korea)
Some questions about requirements:
Who certifies workers to return? Healthcare provider (who has the test), self, state/national orgs? How do we certify that (antibody tests?)
What about other controls like travel restrictions, quarantine for international visits…
What is acceptable risk? How to avoid workers comp claims as employer? (what is safe workplace)
What if COVID is rolling curves of infection (relapses or mini-pandemics)?
Which industries will come out first? What dependencies will make this slower because we optimized SC for only certain events?
Doctors do this fourth step but are not always aware themselves – Menard in New Orleans noticed a high number of flu patients coming in NO in January; did not have test for coronoavirus and miscoded them, but knoew something was different.
Before we say we can’t do this, we may want to look at how test kits can actually get much more data than just symptoms (often multi-respiratory tests to identify different viruses, even unidentified ones – can cross off the list)
Are we being Cassandra? Or is this easier in hindsight (i.e., only thing we have to do is put into project plans?) No, I think it’s a bit more of a call for architects to look around the fringes and bring technologies into the mainstream. The next pandemic will look different, so just solving for what is happening today is not adaptation, it is just making one aspect more robust.
Not saying this is the solution or even one solution, but as we design systems, we have some technologies that may be used – with more data feeds than we’re used to, more integrations, more cross-organizational sharing (or drop-outs) – to be able to build these resilient systems.
Shed fear of ambiguity over the process we’ve learned from the last way of working (we’ve learned over years through experiences, which preserve ourselves but cut off ways of thinking that modify the model). The customer is radically changing (or moving ahead) in this period and unlikely to go back to previous modes. Your organization can choose to go forward. As a personal practice of architecture, this means we do not treat the experiments as a side job (lesser) to large projects simply because they are comfortable; the experiments hold the key to adaptation and higher business value.
Free of quarterly earnings, propose different staffing and resourcing
Test & learn – truly fail, and truly learn from failure
Observe interactions and customers
Everything through the lens of customer value; even small changes can get 10x return (avoid big projects with 1.5X ROI – waste of capital focused on incrementally getting back to old way of work usually)
Supply chain resilience has new meaning
By industry, BusinessInsider - https://www.businessinsider.com/coronavirus-recession-industries-hit-chart-2020-3
Brookings - https://www.brookings.edu/blog/the-avenue/2020/03/17/the-places-a-covid-19-recession-will-likely-hit-hardest/
Accenture on consumer goods - https://www.accenture.com/us-en/insights/consumer-goods-services/coronavirus-consumer-goods-rapid-response (note broadening of supply chain surveillance)