Welcome to the 2nd session of BET..
Faculty for the session:
Dr. Sanjay Bhatikar
Director, Monsanto Research Centre
11th August 2016, from 2.25 pm to 4:00 pm
In this session we will hear about
So.. Let’s BET!
B. E. T. - Business EducationTimeout!
Bricks with Clay: Building
blocks for a career in Data
science
I. THE ART OF SCIENCE
Art becomes science
Cogito, ergo sum
Cogito, ergo ..ummm
II. AI – AND SO IT BEGAN
Create
Market
Make
III. HOW TO DECIDE
ARE YOU SMARTER THAN 134 OTHER WOMEN IN
MONSANTO WOMEN’S NETWORK?
1.
In this short puzzle,
you’ll try to outwit
the masses – who
are also trying to
outwit you.
2.
Your mission is
to read the
minds of your
fellow Prasiti
members.
3.
Pick a number from 0 to
100, with that number
representing your best
guess of two-thirds of
the average of all
numbers chosen in the
contest.
4.
Stop and think for a
second. What
is everyone
else going to do?
EXAMPLE I
Average: _______
You Win by Picking: _______
41, 37, 42, 49, 33, 51, 43, 35, 47
42
28
EXAMPLE II
Average: _______
You Win by Picking: _______
21, 1, 12, 10, 13, 9, 17, 5, 11
11
7
When you’re ready, write
the number down.
Stop and think for a
second. What is everyone
else going to do?
GAME ON!
We tend to
make every
decision like
it is the first
time ever.
DECISION
ALTERNATIVES
ARGUMENTS
CRITERIA ASSUMPTIONS CONSTRAINTS
IV. OF SHAKESPEAREAN INSULTS &
BEER
How to represent Date in an IT system?
String Julian Object
1. Ease of
interpretation
2. Multiple
nomenclatures
3. Operations
require
implementation
1. Counter-
intuitive
2. One
nomenclature
for all purposes
3. Operations
simpler with
libraries
1. Platform
dependent
2. Ease of
operations
mm/dd/yy 1470900783
<interpretability> <operability> <portability>
V. DATA MODELS &
VISUALIZATION
Let’s have a
party and invite
all our friends!
Let’s do
that!
Great! I’ll get a list
going and mail it out
to you. Update it
and revert.
What
the
@*#?<!
# Name Veg Beer
1 Marco N Y
2 Ryan Y Y
# Name Veg Beer
1 Marco N Y
2 Ryan Y Y
3 Jane N Y
# Name Veg Beer
1 Marco N Y
3 Jane N N
# Name Veg Beer
1 Marco N Y
2 Ryan Y Y
3 Jane N N
MODELVIEW CONTROLLER
Movie
Name
Year_of_Release
Genre
Rating
Stars
Movie
ID
Name
Year_of_Release
Rating
Stars
Genre
ID
Name
MG_Link
Movie_ID
Genre_ID
Movie
ID
Name
Year_of_Release
Rating
Genre
ID
Name
MG_Link
Movie ID
Genre ID
Actor
ID
Name
Born
Died
Address
MA_Link
Movie ID
Actor ID
Movie
ID
Name
Year_of_Release
Rating
Genre
ID
Name
MG_Link
Movie_ID
Genre_ID
Actor
ID
Name
Born
Died
Address
MA_Link
Movie_ID
Actor_ID
SELECT Movie.Name
FROM
Movie, Genre, MG_Link,
Actor, MA_Link
WHERE
Genre.ID = MG_Link.Genre_ID
AND MG_Link.Movie_ID = Movie.ID
AND Actor.ID = MA_Link.Actor_ID
AND MA_Link.Movie_ID = Movie.ID
AND Genre.Name = “Thriller”
AND Actor.Name = “Al Pacino”
AND Movie.Rating > 8.5
Movie
ID
Name
Year_of_Release
Genre
Rating
Stars
Genre
ID
Name
MG_Link
Movie ID
Genre ID
Actor
ID
Name
Born
Died
Address
MAS_Link
Movie ID
Actor ID
Screen_name ID
Screen_name
ID
Name
Synopsis
Quotes
VI. APPLICATION PROGRAMMING
INTERFACE
POST
GET
PUT
DELETE
Request
Response
http://www.omdbapi.com/?t=Star+wars
{
“Title”: “Star Wars”,
“Year”: “1983”
“Rated”: “N/A”
…
}
VOLUME
VARIETY
VELOCTY
SHAREABLE
INTEGRATABLE
SCALABLE
UNICORNS
Construct the product
Implement in code
3
Translate the problem
Speak business and IT
1
Re-imagine the process
Cast in digital avatar
2
JUSTDOIT
Design APIs
Use: Behavior Driven Developm
3
Construct Ontologies
Use: Semaphore
1
Build Data Models & Viz.
Use: Spotfire, SQL, Excel
2
BACKUP
Artifacts
1. Recipes
2. Marker Pen
3. Chocolates

Data science - Bricks with Clay

Editor's Notes

  • #4 Crew breaks for lunch at precise noon. How?
  • #5 Q&A Look at the construction crew in this pic. What level of skill do you think this work requires? Low. In fact, it may not require any skill at all.. 3-D printers. Construction has become standardized. Look at this construction. It is built from pre-fabricated structures, Assembled on-site like Lego® bricks. In fact.. (Next: 3D printing) Ref:
  • #6 It has become so standardized, That it is possible to 3D print. Teams have demonstrated..
  • #7 House and car share power supply.
  • #8 Q&A Mechanization eliminates routine work. That’s good news, right? It leaves us free to be creative. In the digital era, creative work will also be automated. It wasn’t always so. Guilds.. Jealously guarded their art. Over time, art become science Science squeezed all the art out. So people can lunch at noon, Or 3D print houses. This applies to work that is routine and physical, right? What about our ability to think? To create?
  • #9 Rene Descartes, cogito, ergo sum.
  • #13 From mechanization of pizza-making.. Next: Let’s raise the stakes ..
  • #14 From mechanization of pizza-making.. Next: To learning the expertise for brewing beer or the art of a sushi chef
  • #15 Autonomous robot moving about the house.. Next: Let’s raise the stakes ..
  • #16 To autonomous robot moving about the roads
  • #17 Engaging an opponent in the sport of badminton.. Next: Let’s raise the stakes ..
  • #18 To engaging an opponent in mortal combat.
  • #19 Q&A With machines doing tasks involving design and decision-making, what does the workforce of the future look like? People who create, people who make, people who market, will all be rolled into ONE. Unicorns! Tasks that require human creativity, That tap into human ingenuity, Are now within the purview of machine intelligence. Design and decision-making. This is edging out roles that require ONE skill.. People who create (R&D) People who make (Engineers) People who sell (Marketeers) All rolled into one! UNICORNS What are they to do?
  • #20 Help enable your organization’s digital transformation.. Decisions..
  • #21 Problem: We tend to make every decision like it is the first time ever. In Research & Development, in Supply Chain, in Marketing & Sales. That bogs us down. Capture how decisions are made. Activity: Are you smarter than everybody else at this event?
  • #22 Problem Statement We tend to make every decision like it is the first time ever. Decisions.
  • #23 Q&A Where is this game (of guessing what other people are thinking,) relevnat? Game Theory: How people value a stock option. The point of our little game? To show that there is a science underlying decision making. This science can help capture how decisions are made. Even decisions based on what other people are thinking. Problem Statement We tend to make every decision like it is the first time ever. Decisions.
  • #25 Problem Statement We tend to make every decision like it is the first time ever. That bogs us down. It creates inconsistency. It produces inefficiencies. We can reduce the art of decision-making to a science. How? Anatomy of a decision. Decision Point (How to?) | Alternatives (Design Choices) | Arguments (Criteria, Assumptions, Constraints) E.g. How to store a date in an IT system.
  • #26 Q&A Think about a decision you have made. Can you frame it in the way we discussed?
  • #27 Q&A Can you label anatomical parts of a decision on this helpful scheme from Pizza Hut? Like so: Decision: Who has the last slice of pizza? Alternatives: I do or I don’t Criteria: Is it in your half of the plate? Assumptions: You are: by yourself | out on a date | with a big group Constraints: A pizza slice is indivisible
  • #28 There’s a catch. A small percentage of decisions can be captured this way. The decision about who gets the last slice of pizza was simple. As we know more, it is harder to decide. For example..
  • #29 How Baltimore Oriole’s organize their work.. I am sure there are elements here that appeal to you, even without understanding what the great game of baseball is about. What comes in the way of capturing decision rationales .. Lack of standard vocabularies. We need ontologies, specifically .. Product Ontologies: What items get produced in an organization value-stream Process Ontologies: How work gets done, in small steps Decision Ontologies: Criteria, Assumptions and Constraints used to build arguments Product Ontologies: We test proteins for efficacy against insects. Proteins are chemical structures manufactured by genes. They are like a key that opens a lock called receptor on the insect gut and dissolves with insect gut, thus killing the pest. The key opens only one lock. Over time, insects change the lock. So Biotechnologists try to tweak the key. There is a diverse set of protein families, a range of pest families dispersed across geographical locations. We need consistent nomenclature for proteins, taxonomies for pests and codes for geographical locations. Process Ontologies: Different assays follow different protocols - different ways of preparing insect diet, administering the toxin, capturing data and measuring end-points and reporting results. The assays in India may use wheat flour whereas those in the US use agar. One lab uses LD50 dose (that kills 50% of the bugs) as the toxicologic end-point. Another lab may also record stunting. This comes in the way of data-enabled comparisons. Decision Ontologies: Criteria such as thresholds must be standardized, not left to discretion. Other variables that may affect the observed outcomes require recording and standardization.
  • #31 Q&A How do we standardize a vocabulary? Dictionary, Thesaurus Taxonomies – Classification System Stanford University’s WordNet clusters words by frequency of association e.g. etymological roots. Frequency of association in various contexts can help classify documents leading to inferences about the subject matter of communications. Here’s a taxonomy about my favorite food.. Beer. If beer aint’ your thing..
  • #32 Here’s an ontology of rappers’ names. Rappers have creative names, drawn from Crime | Wordplay | Physical attributes, etc. Now we not only see standardization emerge, But association among concepts, And how they relate to each other.
  • #33 Further exemplified by this “fun” ontology, Of Shakespearean insults. Handy for year-end appraisals. Some of my favourties are: Thou rooting hog He has not so much brain as ear wax Thou odoriferous stench
  • #34 How does knowing the relationships among concepts help? Example of recipe that calls for use of Colander. Without prior knowledge of what a colander does, I might use it for a hat! With the ontology, I can infer That it is a type of strainer, Like perhaps a coffee filter, And apply it correctly. Show recipe-book and recipe.
  • #35 Q&A How to represent date in a database? String or Julian or Object Decision Ontology – Product Nouns Process Verbs Decision – Argument (Criteria, Assumptions, Constraints) Ontologies – not only the glue for pulling information together into capture decisons, glue of the semantic web The marriage of Ontologies and APIs is producing the semantic web. The generations of web before: 0: Shareable Web: System for sharing files in digital form in a closed group 1. Browseable Web: Share data and functionality with browser at the interface of (wo)man and machine 2. Searchable Web: Find what you need – data or functionality 3. Integratable Web: Find what you need and wire data into application workflows 4. Semantic Web: It knows! Before we understand APIs, we must take a peek under the hood at a pattern – Model-view-controller
  • #37 Prepare an invitation list for a party in Excel Start with a spreadsheet and mail it around Pass it around in a chain’ Place it on a shared drive Use google spreadsheets Model-View-Controller
  • #38 Prepare an invitation list for a party in Excel Start with a spreadsheet and mail it around Pass it around in a chain’ Place it on a shared drive Use google spreadsheets All operations with data are CRUD The example illustrates the value of CRUD over web. You got the WHY. The Model-View-Controller is HOW that is done! It tells about the opportunities for Data Science in non-IT departments. (I.e. people who don’t know to code.)
  • #39 Start with a spreadsheet and mail it around Pass it around in a chain Place it on a shared drive Use google spreadsheets Model-View-Controller
  • #40 Start with a spreadsheet and mail it around Pass it around in a chain Put it on a shared drive Use google spreadsheets Model-View-Controller
  • #41 Start with a spreadsheet and mail it around Pass it around in a chain Put it on a shared drive Use google spreadsheets Model-View-Controller Again: The example illustrates the value of CRUD over web. You got the WHY. The Model-View-Controller is HOW that is done! It tells about the opportunities for Data Science in non-IT departments. (I.e. people who don’t know to code.)
  • #42 Model-View-Controller Put database on the server Give a nice user interface (browser) Model: Oracle, MySQL, MongoDB, Spreadsheet, etc. View: Browsers, HTML 5, Spotfire, Tableau, etc. Controller: Programming Languages (Java, Javascript, Python, etc.) & Frameworks (Hibernate, MEAN, Django, etc.)
  • #43 VIEW is everything (After all, isn’t a picture worth a thousand words?)
  • #44 View is everything .. HBR Article – Executives must have visualizations as part of their core skills. Conceive and implement. Key is to make compelling dashboards that make complex interactions between variables buried in a ton of data apparent at a glance.
  • #45 Who plays which shot from where to what effect? Dirk’s plays at his most potent from (here) but takes a lot of wasted shots from (here).
  • #46 Who plays which shot from where to what effect? Ray plays at his most potent from (here) and that is where he takes most of his shots from.
  • #47 Big Data – About telling a story with a humungous amount of data in one picture. Such as, key events in the entire history of the universe, like when stars appeared, or that the universe is expanding.
  • #48 Continuing theme.. Telling “big data” stories with simple and compelling visualizations. Shows that the number of events has increased and attendance is divided across dispersed events. -new entrants, like Bangalore.
  • #49 New York Times is a great source where “big data” visualizations abound. Stories about.. GOVERNMENT How Obama spends US taxpayer’s money..
  • #50 New York Times is a great source of “big data” visualizations. Stories about.. BUSINESS Companies that went IPO
  • #51 New York Times is a great source of “big data” visualizations. Stories about.. SOCIETY Spread of Ebola
  • #52 New York Times is a great source of “big data” visualizations. Stories about.. SOCIOECONOMICS Spread of Ebola
  • #53 Activity: Represent this pen Write down every attribute about this pen you want to capture How would you represent this? Table? Problem User Interface constraint Visualizations are a great way for story-telling with big data They are not a great way of communicating what data is to be captured. Hence.. MODEL Again, views have limitations. Not a good way of communicating the requirement about what data needs to be captured Backend-changes extract a high cost (e.g. adding a new mode of payment, say, Demand Draft) Limits what data can be pulled out, in what form
  • #54 DATA is everything. (Big Data is almost a cliché.)
  • #55 Q&A: What types of data are you familiar with? Maps, Social Graphs, Tabular Data (Spreadsheets) Let’s just pick tabular data..
  • #56 Activity: Normalize a schema Movies: Data upon movies from IMDB database. Movie Year of Release Genre Rating Stars Ponderables: Show the limitations of stuffing too much data in a spreadsheet Say I want to search all movies by ratings, what shall I do? Say I want to search all movies by ratings in a particular genre, what shall I do?
  • #57 Activity: Normalize a schema Ponderables: Search across multiple spreadsheets? Ergo, RDBMS Say I want to search all movies by ratings, what shall I do? Say I want to search all movies by ratings in a particular genre, what shall I do? Say I want to search all movies by ratings in a particular genre featuring a star I like, what shall I do? Say I want to add information about actors, what shall I do?
  • #58 Activity: Normalize a schema Ponderables: Adding more information requires more tables. Say I want to search all movies by ratings, what shall I do? DONE Say I want to search all movies by ratings in a particular genre, what shall I do? DONE Say I want to search all movies by ratings in a particular genre featuring a star I like, what shall I do? Say I want to add information about actors, what shall I do?
  • #59 Activity: Normalize a schema Ponderables: Say I want to search all movies by ratings, what shall I do? DONE Say I want to search all movies by ratings in a particular genre, what shall I do? DONE Say I want to search all movies by ratings in a particular genre featuring a star I like, what shall I do? DONE Say I want to add information about actors, what shall I do? DONE Say I want to add information about screennames, what shall I do?
  • #60 Activity: Normalize a schema Ponderables: Say I want to search all movies by ratings, what shall I do? DONE Say I want to search all movies by ratings in a particular genre, what shall I do? DONE Say I want to search all movies by ratings in a particular genre featuring a star I like, what shall I do? DONE Say I want to add information about actors, what shall I do? DONE Say I want to add information about screen names, what shall I do? DONE
  • #61 Q&A Why not perform CRUD on data on a server? Language | Interpretation | Cost | Access Schema explosion! Problem: Knock-yourself-out complexity CRUD Language: SQL has a steep learning curve Interpretation: Databases are normalized, so the association with physical objects requires substantial effort Cost: Hire “bi-linguals” – who speak data and domain – this has a cost Access: Access to DB is non-trivial. There’s got to be a better way. Ergo, APIs!
  • #62 Volume-Variety-Velocity. Examples: Uber Inshorts There’s got to be a better way. APIs!
  • #63  Model-View-Controller Model Types of Data Maps Graph (Social) Tabular
  • #64 Q&A: What types of data are you familiar with? Maps, Social Graphs, Tabular Data (Spreadsheets) Data is text.
  • #65 Data is text
  • #66 Data is (still) text.
  • #67 CRUD is C is POST R is GET U is PUT D is DELETE
  • #68 Activity: LinkedIN API [Optional]
  • #69 This stripped-down version, called JSON, gives more flexibility. We can find data and organize it different than the UI (Sharable, Searchable) We can take data from multiple sources and integrate it (Integratable) And we can use machines to do all of it at lighting speed (Scalable) Thus delivering VVV!
  • #70 VVV for NYT – I can now pull out a large corpus with a script that invokes the API InShorts – I can pull data out of various sources, each of which has an API, and integrate it Uber – I can do this in real-time
  • #71 Activity: LinkedIN API [Optional] BDD and unit tests Epics and Stories.. [JIRA] Microservices and API end-points [BDD Suite e.g. Mocha] Data model [Excel, ORM]