14. Howmuchdatawegenerateeveryday ?
• 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of
Things (IoT).
• Over the last two years alone 90 percent of the data in the world was generated.
• Internet:
• More than 3.7 billion humans use the internet (that’s a growth rate of 7.5 percent over 2016).
• On average, Google now processes more than 40,000 searches EVERY second (3.5 billion searches per day)
• Social Media :
• Snapchat users share 527,760 photos
• More than 120 professionals join LinkedIn
• Users watch 4,146,600 YouTube videos
• 456,000 tweets are sent on Twitter
• Instagram users post 46,740 photos
• Communication: 16 million text messages,990,000 Tinder swipes,156 million emails are sent; worldwide it is expected that there will be
2.9 billion email users by 2019,103,447,520 spam emails sent
• Services: Weather Channel receives 18,055,556 forecast requests,Uber riders take 45,788 trips!
• IOT
• https://web-assets.domo.com/blog/wp-content/uploads/2017/07/17_domo_data-never-sleeps-5-01.png
18. WhatisthePracticeofMachineLearning?
• MLalgorithmsaregreatat findingpatterns
thatlooklikewhat youtellthemtolookfor
• Theywillnothelpyoufigureout whattype
ofpatternsmakesense foryourproblem
• Theywillalwaysfindpatterns,but willnot
quantifyhowstrongthesepatternsreally
areinthedata
Thisisthepractice ofML:
Understandadditionallywhen each model is
appropriate, howtoevaluatesuccess,andwhat
toolsare availablefortailoringeach model toyour
specific problem
ThisisMLinterpretednarrowly: Understand
howthemodels work
24. Probabilities: Monty Hall Problem
https://en.wikipedia.org/wiki/Monty_Hall_problem
1 2 3
http://www.montyhallproblem.com
Scenario 1: You decide to stay with your originalchoice of door
Product
rule!
Probabilityofwinningtheprizeinscenario1:1/3+0=1/3 Sum of disjoint
events!
28. MLThemes&BestPractices
• Themes
• Bias/variance tradeoff
• Optimality/efficiency tradeoff
• Thereisnever a“right” answer
• Different tools andmodels areappropriate in different situations.
• Bestpractices
• Usecross-validation for parameter tuning
• Maintain aseparate validation set
• Measure quality withprecision, recall, f1
• Usepackages (don’twrite yourownalgos, unlessit’s for learning)
• Gettoknowyour data
• Trylotsof different things,butexplore inaprincipled way
29. DigitalTransformation:
• Digital transformation is theintegration of digital technology intoall areasof business,fundamentally changinghowyou operate and deliver value to
customers.
• Customerexperience
• Operational agility
• Cultureandleadership
• Workforce enablement enablement
• Digital technology Integration
• NoSQLDatabases :
a. Column: Each storageblockcontainsdata fromonlyonecolumnEx: Accumulo,Cassandra,Scylla, ApacheDruid, HBase,Vertica.
b. Document: It storesdocuments made upoftagged elementsEx: ApacheCouchDB,ArangoDB, BaseX, Clusterpoint,Couchbase,
CosmosDB,IBM Domino, MarkLogic, MongoDB, OrientDB,Qizx, RethinkDB
c. Key-value: IthasaBigHashTable ofkeys &valuesEx: Aerospike,Apache Ignite,ArangoDB, BerkeleyDB,Couchbase,Dynamo,
FoundationDB, InfinityDB,MemcacheDB, MUMPS,Oracle NoSQL Database, OrientDB,Redis,Riak,SciDB, SDBM/Flat File dbm, ZooKeeper
d. Graph: Anetworkdatabase thatusesedges andnodes torepresentandstoredata. AllegroGraph, ArangoDB, InfiniteGraph,Apache
Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
DigitalTransformation
30. DataAsaService
DaaS is an approach to make data available whenever it is needed, and fits into the larger “SOA” Service-oriented Architecture design pattern.
DaaS is an approach, within SOA, that values, shares and focuses on data.
Why Data as a Service?
• First of all – creating a bunch of services, which move data around, does not constitute DaaS, unless they are designed to yield certain
benefits.
Herearekeybenefits thatmotivate anddefine DaaS:
1.Valuable, re-usable and uniform :
Data Services in a DaaS environment should have value across multiple projects, and the value of the data formats and data services
should be designed to both outlast and exceed the value of the particular systems that first use the data services.
2.Secure
Security, in particular, must be uniform and ubiquitous. It is a barrier to adoption if some underlying systems use different security
models. Different groups will not share data without built-in security, and too much data without controls becomes a privacy and
compliance risk.
3.Virtual data and abstraction
Data Services should abstract away from underlying data stores and locations, including “silo busting” combinations of data from
multiple sources in multiple formats, presented seamlessly as one service.
4.Do not focus on the plumbing (enabling technologies)
Think about what would happen if you hired a plumber as the architect for your house. You’d likely end up with pipes, valves and other
exposed internals running through your living room – complicating and cluttering, rather than making your house livable. Plumbing
should be hidden and transparent, and invisibly enable your structure to function.
31. DataAsaService
DaaS is an architectural pattern. Most developers know how to add SOAP services with WSDL definitions or REST calls, and passing XML (or
JSON or RDF) around. This technique may be necessary, but doing so gratuitously does not help create a DaaS architecture. Focus on data formats
and service definitions, not the protocols and technologies used to expose and wire them up.
5.Don’t Confuse DaaS With Cloud
Just as plumbing is an enabling technology, cloud computing is an infrastructure approach. DaaS is about the architecture, so must focus on how data
is formatted and transmitted, and the interfaces between subsystems. What servers a system runs on is very important, but should not be confused
with the DaaS pattern. Yes, you can put a server hosting DaaS services in the cloud.
6.Understand Why DaaS is Not an Enterprise Data Warehouse
EDW efforts often fail because of modeling complexity. DaaS is more agile in that you can roll out individual services without modeling your entire
enterprise first.
A “big design up front” modeling exercise that involves underlying databases and E-R diagrams will have the same failure modes as a large Enterprise
Data Warehouse. Which is to say: many
7.Forget about relational modeling (at least at first)
In DaaS, the service formats are king – aka the “wire” formats used to integrate components in your enterprise. The point is to abstract away from
which underlying system or systems participate in serving the request, including abstracting away from your relational database and its physical
model. The underlying systems could be microservice from relational databases, search engines, NoSQL databases.
8.Don’t let data service development be anyone’s second job
A mentor of mine once pointed out that “every organization is destined to build an enterprise architecture that mirrors their org chart” and I have found
that to be absolutely true.
If you let your business service modelers, developers or DBAs define your data services, you will end up with services that are only good for the
immediate task at hand, and do not provide the lasting value and abstractions you need for a good DaaS architecture.
Instead, empower a team to own the data services and take a stand for clean data services that yield lasting value. That debate and negotiation will
improve your entire enterprise
33. www.productschool.com
Part-time Product Management, Coding, Data Analytics, Digital
Marketing, UX Design and Product Leadership courses in San
Francisco, Silicon Valley, New York, Santa Monica, Los Angeles,
Austin, Boston, Boulder, Chicago, Denver, Orange County,
Seattle, Bellevue, Washington DC, Toronto, London and Online