Using APIs in enterprise for reporting, data science and systems integration. Originally presented at the 2015 API Strategy & Practice conference, Nov. 20, in Austin, TX.
In today's high paced digital marketplace no company is entitled to their business model.
The origins of web APIs are in Amazon's need to improve its operations and in particular to rationalize their internal system integration. Them being a native web business it was natural to leverage web technologies. Since then APIs took off on the web driven by pragmatic developers working on e-commerce and digital marketing solutions. This work moves fast and relatively inexpensively. Meanwhile billions of dollars are spent every year on enterprise systems integration.
In this deck we are showing how to leverage resource oriented APIs to overcome data silos, multiply return on investment in databases and create web of data assets consumable not only by software developers but also business analysts, data scientists and management.
3. 2002 at Amazon
• All teams will henceforth expose their data and functionality
through service interfaces.
• Teams must communicate with each other through these
interfaces.
• There will be no other form of inter-process communication
allowed: no direct linking, no direct reads of another team’s data
store, no shared-memory model, no back-doors whatsoever. The
only communication allowed is via service interface calls over the
network.
• It doesn’t matter what technology they use.
• All service interfaces, without exception, must be designed from
the ground up to be externalizable. That is to say, the team must
plan and design to be able to expose the interface to developers
in the outside world. No exceptions.
• Anyone who doesn’t do this will be fired.
• Thank you; have a nice day!
3
“
”- Jeff Bezos
5. 5
2015 Global IT Spend $2.3T
Source: Forrester Research - Global Tech Market Outlook for 2015-2016 (after ZDNet)
6. ETL/Data Warehousing
6
Analytical Systems
• Data duplication
• Stale data
• Brittle overnight
feeds
• Central bottleneck
• Does not scale out
• Not easily
accessible nor
searchable
8. Store It All in One Place?
• It was hard with just
on-premises
systems
• Illusory idea with
today’s Cloud apps
• Try it with your
contact list for
starters…
8
10. What is Resource Oriented
Architecture (“ROA”)
• “Style of software architecture and
programming paradigm for designing and
developing software in the form of
resources with RESTful interfaces.”
– Wikipedia
• Uniform data access layer to all data
assets in their unobstructed form for
reading and writing in various
representations. – my take
10
11. What is Resource Oriented
Architecture
Service Oriented
• Represents Action
• Transaction, Unit of Work
• Message
• API controlled by
functional design
• Harder to adapt and scale
beyond “enterprise”
• Harder to deprecate
functionality
Resource Oriented
• Represents State
• Addressable Resource
• Update to Resource
• API automatically evolves
with data
• Harder to model into
complex transactions
• Clients must be resilient
to change
11
12. • Single access point, but without copying data
• Self-service reporting, data feeds or integrate with NoSQL
API Shell Over Data
12
13. Database Content as
HTTP Resources
13
http://demo.slashdb.com/db/Chinook/Customer/CustomerId/1.html
Service location
• On the intranet, or
• In the cloud
Database
name.
Supported
RDBMS:
• MS-SQL,
• Oracle
• MySQL
• PostgreSQL,
and more
Table to query Field to filter and
value to lookup:
• Text
• Number
• Date
Data format
• XML
• JSON
• HTML
• CSV
Combine
several
/db automatically makes hyperlinks directly to data
Related records are hyperlinked thus search engine ready
Filtering, drill-down, slices are natural, URLs stay nice
Custom queries also possible (SQL Pass-thru)
14. Best Practices
• Don’t forget about “R”
in REST
– JSON isn’t the only
data format
• URL should be easy
to understand
– Avoid inventing mini-
query language
• Resources should be
easy to discover
• Ideally every resource
address should allow
reading and writing
• Avoid query string to
address data
14
16. Use Case: Bank - Regulatory
Risk Management
• Federal Reserve CCAR
• Basel Independent Review
• Supervisory Formula Approach (SFA)
• Dodd-Frank Annual Stress Test
16
17. 2015, Global Bank
Upwards 50% of
my time goes into
data reconciliation
efforts.
“ The biggest pain is
sharing data
between Python,
R, etc.
The problem is -
there should be
one specified entry
point for data.
Consistency of
column names and
possible values
between different
versions of the
data.
There are a lot of
holes in the data
process. I think the
#1 priority would
be creating a good
schema.
”
Finding what you
need in this zoo.
(…) Currently this
is done by talking
to people!
17
18. Data Science Process
18
• Data acquisition, storage, discovery and
mining, statistical learning, machine
learning, predictive analytics, risk
modeling
• Competency
chasms at
every step
19. Implemetation: SlashDB API
19
Model Research & Dev.
use any programming language
Reports & Visualization
deliver now, anticipate future
Unobstructed Data Sharing
standard formats, HTTP delivery
Disparate Data Sources
loan portfolios,
macroeconomic data,
risk metrics, market data
Automatic,
multi-representational,
resource-oriented,
hypermedia and
search engine friendly
data API & cache.
20. Resource Oriented API
Solves Many of the Issues
• Single access point that’s easy to work
with
• Combines the best features of plain files
(simplicity) and databases (data integrity)
• Has authentication, authorization and
encryption
• Pragmatic data access for people and
programs
• Search engine ready
20
21. Searchable API
21
• Users know what they need, but may not
know where to find it
• True hypermedia API should contain
hyperlinks to related resources
• Search engine crawl/index is trivial when
all resources are hyperlinked
• Try it yourself at:
http://demo.slashdb.com/search.html
(i.e. search for: “customers from Brazil”)
22. Resource Oriented API is a
Sensible Investment
• Multiply returns on investments already
made in databases (the other ROA)
• Avoid pitfalls of file-based data sharing
• Avoid dangers of direct database access
• Avoid opaqueness of ESB, RMI, SOAP,
CORBA, etc., etc.
• Attract top developers
(they want to work on cool stuff, and they don’t know databases)
22
23. MAKE ENTERPRISE GREAT AGAIN
Presentation by:
Victor Olex
@agilevic
victor@slashdb.com
24. Credits & References
• S&P Churn 2002-2012
“Creative Destruction Whips through Corporate America”
by Richard Foster, Innosight
http://www.innosight.com/innovation-resources/strategy-innovation/upload/creative-destruction-whips-through-corporate-america_final2015.pdf
• 2002 at Amazon
“The Secret to Amazon’s Success Internal APIs”
by Kin Lane, API Evangelist
http://apievangelist.com/2012/01/12/the-secret-to-amazons-success-internal-apis/
• Flattening the Competition
Google Finance, chart prepared by V. Olex
https://www.google.com/finance?q=amzn
• 2015 Global IT Spending
“Want money for that new project? Then it's time to go on a moose hunt”
by Steve Ranger, ZDNet
http://www.zdnet.com/article/want-money-for-that-new-project-then-its-time-to-go-on-a-moose-hunt/
• SaaS Revenue Projections
“Enterprise software spend to reach $620 billion in 2015: Forrester”
by Natalie Gagliordi, ZDNet
http://www.zdnet.com/article/enterprise-software-spend-to-reach-620-billion-in-2015-forrester/
• What is Resource Oriented Architecture
Wikipedia
http://en.wikipedia.org/wiki/Resource_oriented_architecture
• Data & Analytics: Benefits & Challenges
“5 Insights & Predictions On Disruptive Tech From KPMG's 2015 Global Innovation Survey”
by Louis Columbus
http://www.forbes.com/sites/louiscolumbus/2015/11/08/5-insights-predictions-on-disruptive-tech-from-kpmgs-2015-global-innovation-survey/
• Data Science Process
https://en.wikipedia.org/wiki/Data_science
• Other graphics
Photographs of D. Trump, Flicker and public domain sources
• Logos and other trademarks are the property of their respective owners; used here for illustration purposes only, no association or endorsement implied.
24
Editor's Notes
Creative Destruction Whips Through Corporate America. Lifespan of company in S&P 500 1958 – 61 years, now 18 years. No company is entitled to its business model.
Traditional data warehousing and ETL cannot really cope with the issue because ultimately they just create copies of data. Stores of record change over time, feeds need to be regularly maintained, monitored.
The innovation that has taken place in databases however has actually gone back to the old albeit improved idea of key-value store (think dbm developed in 1979 by AT&T) and gave us the NoSQL movement. Reportedly infinitely scalable but even harder to weave into the information flow in enterprise. Innovation tends to go for the bigger and faster, which is great but not always the most pragmatic direction. Meahwhile, the powerful sharing (engaging) nature of the web has been facilitated by a simple idea of abstracting information resources with URLs transmitted over relatively low-performance hypertext protocol.
I order to achieve this level of interaction we are proposing a new way to think about data integration.
Beside (or instead of) overnight data feeds into data warehouses we need a light, on-demand facade, which abstracts databases, tables and records into online resources.
Those resources have to accessible to both software engineers and domain knowledge workers (data scientists, business intelligence, quantitative analysts, salespeople). We call this a Resource Oriented Architecture.
Our solution has been to automatically hyperlink all the data in order to abstract it as online resources, similar to how web pages are built except builtd out of systems of record.
What you are seeing is an actual URL from /db. It is easy to understand what it represents. The host name could be local to your intranet or remote in the cloud. What follows is /db followed by a database name. After that we see a table name followed by a pair of field and value, which constitute a filter on the table. You can have more than one of these. Lastly there is desired a data format. It is also worth pointing out that related records are linked and therefore can be crawled by a search engine. For breviety's sake we cannot show all URL options on this slide but I will show more in the demo. Where automatic URL are not sufficient there also is an option to use custom SQL queries mapped to a URL.