Diego Oppenheimer discusses the rise of algorithm marketplaces and the new "algorithm economy". Key points include:
- Advances in machine learning, computer vision, speech recognition and natural language processing are enabling algorithms to interpret unstructured data at scale.
- Algorithm marketplaces allow algorithms to be hosted, discovered, monetized and composed modularly to address a wide range of use cases across many industries.
- The algorithm economy will lower barriers to applying machine intelligence and foster innovation as algorithms become reusable assets that creators and users can both benefit from.
Algorithm Marketplace and the new "Algorithm Economy"
1. Algorithm Marketplaces and the new
"algorithm economy“
Data Day Texas 1-16-2016
Diego Oppenheimer
CEO and Founder
2. $100 free to get started. Signup at
Algorithmia.com with Promo Code:
DATADAYTX
3. Diego Oppenheimer - CEO / founder Algorithmia
• 10+ years building Business Intelligence and Big Data tools
• Led advanced data analysis tool development at Microsoft - 1 billion
users reached
Shipped Excel, SQL Server, PowerBI v1.0
• Previously founded an algorithmic trading startup
• Techstars/Startup Weekend Coach and Mentor
• B.S. and M.S. Carnegie Mellon University
• Passionate data analysis enabler
Email: diego@algorithmia.com @doppenhe
5. • “In economics productivity is a measure of technological progress. Productivity
increases when fewer inputs are used in the production of a unit of output”
• We went from Hunter-gatherer to agriculture to industrial to the next revolution:
interpretation of data.
• Algorithms are at the center of the next revolution. They are the tools of our
generation.
• If data is the new oil, advanced algorithms are the drilling platforms, pipelines,
tankers and gas stations.
The briefest history of technology…ever
6. “…data is inherently dumb. It doesn’t actually do anything unless you know how to use it.
And big data is even harder to monetize due to the sheer complexity of it.
Data alone is not going to be the catalyst for the next wave of IT-driven innovation. The next
digital gold rush will be focused on how you do something with data, not just what you do
with it. This is the promise of the algorithm economy.”
Peter Sondergaard (Gartner Research)
7. Staggering pace of data collection
Sources: Cisco, ComScore, MadReduce, Radicati Group, DataScienceCentral, Insights wired, IBM, EMC,GMAOnline, Twitter, YouTube, Manthan for Strategic Innovation
• 10,000 Tweets per sec
• 2,283 Images per sec
• 1,792 Skype calls per sec
• 49,466 Google searchers per sec
• 103,310 Video viewed per sec
• 2,406,488 Emails sent per sec
• 55,000,000 Status updates per day
• 28,260 Gigabytes of traffic flows through
internet per sec
• By 2018 69% of online traffic will be mobile
video
• 68% of all unstructured data in 2015
attributed to consumers
• In 2015 enterprise unstructured data will
cross 1600 exabytes
8. Rise of unstructured data
“Unstructured data: data value that has little or no metadata and therefore difficult to categorize.”
Internal External
Where is it coming from?
Photo and Videos Audio Data Social MediaTransactions Log Data Emails
Brand and Social Media properties
Customer Service Centers
Mobile and Market research data
Employee Performance reviews
Consumer Survey Data
Candidate Interviews
Merchandising photos
Crowd Sourcing
Web Scraping
Social Media
Blogs and Chat Rooms
Consumer Product Reviews
9. Classifying unstructured data
Cognition: “the mental action or process of acquiring
knowledge and understanding through thought,
experience, and the senses.
Humans are the gold standard for
interpreting unstructured data…we
just don’t scale.
Business that succeed will be the
ones that are able to interpret their
unstructured data with near human
efficiency at super human scale.
10. Modelling humans in machines
Learning
Perception
Communication
Social Intelligence
Planning
Machine Learning
Computer Vision and
Speech Recognition
Natual Language
Processing
Affective Computing
Automated Scheduling
Human Cognition Machine Intelligence
Machines can provide super human scale.
11. Why now?
1990s Connectivity
$10,000 per month
Servers
$20,000 per box
Storage
$1,000/GB
2000s Connectivity
$1,000 per month
Servers
$1,000 per box
Storage
$10/GB
2010s Connectivity
10 cents/GB
Servers
20 cents/hour
Storage
12 cents/GB
Super human scale = machines…and today they are cheap, plentiful and fast.
12. Advances in Natural Language Processing
• We now suddenly have available to us dozens open
source libraries in the natural language processing space.
• NLTK – ApacheNLP – ScalaNLP – StanfordNLP – etc
• We understand sentiment , intent , entities and are getting
better at it every day.
• The combination with knowledge graphs is allowing to
interpret subject matter almost immediately.
• StockTwits using tweets as signal for trading.
• Ai2 – Interpret questions - Pass the 8th grade geometry test
• Genomic research Great summarizer
13. Feed text and allows a machine to answer questions about it through inference -
Facebook/Lord of the rings
14. Advances in Computer Vision
• Again dozens of libraries per language , huge pain to work with.
• Wrangling OpenCV is a dark art form.
• Ai2.org passed the 8th grade geometry test, interpreting graphics
• Google Vision API/ Clarifai – submit an image get fully recognized objects
• Visual shopping (similar items in looks based on what you are looking at made
super easy through Deep Learning).
15. Advances in Speech Recognition
• Siri/Cortana/Google Now
• Amazon Echo
• Skype live translator /Baidu Mandarin English translator
• CMU Sphinx – training on different lexicons, data sets and sophistication of language levels.
• Tone Sentiment prediction for customer service calls – Wise.io
We now talk to our machines and they “get us”.
16. “I cannot see ten years into the future. For me, the wall of fog starts at about 5
years.
... I think that the most exciting areas over the next five years will be really
understanding videos and text. I will be disappointed if in five years time we do not
have something that can watch a YouTube video and tell a story about what
happened. I have had a lot of disappointments.”
-Geoffrey Hinton’s AMA on Reddit
17. “Its not about the pieces , it’s how the pieces work together”
- ICE CUBE
19. • Marketing
• Product recommendations
• Customer Service
• HR
• Fraud and Churn prevention
• Infrastructure monitoring
• Crime prevention
But…the use cases where machine intelligence can be applied to are growing at a staggering pace.
We move from the era of “capture everything” to being able to “act on everything”.
Most common use cases at the intersection of machine
intelligence and Big Data
20. • Similar techniques trained on different data sets
• Combine multiple techniques and algorithms
• Engineers need to build every step of the pipeline …and then scale it.
• Whats the problem ?
• The skill sets to build models != scale models
• The skill sets to tune algorithms != build pipelines
• Almost every single use case requires re-inventing the wheel.
What do all these use cases have in common?
21. All this power …now what?
• Huge advances in multiple fields of machines intelligence but
practical implementation still hard.
• Finding the right algorithm/library or framework still a challenge.
• Huge disconnect between academic/top tech companies and rest
of industry.
• Top tech company? Let’s go buy a lab.
• Code reusability mostly a myth.
• Incentives between research and users not aligned leading to
disconnect.
Algorithm Marketplaces
A novel approach:
23. 23
Host algorithms
Anyone can turn their algorithms into scalable/shareable, production ready web
services
Typical users: scientists, academics, domain experts
Make algorithms discoverable
Anyone can use and integrate these algorithms into their solutions
Typical users: businesses, data scientists, app developers, IoT makers
Are monetizable
Align incentives between algorithm creators and consumers
Typical scenarios: heavy-load use cases with large user base
Algorithm Marketplaces
Are modular
Algorithms can be stacked or piped together
Typical scenarios: interpretation of unstructured data
26. Make algorithms discoverable
Anyone can use and integrate these algorithms into their solutions
Typical users: businesses, data scientists, app developers, IoT makers
27. Are monetizable
Align incentives between algorithm creators and consumers
Typical scenarios: heavy-load use cases with large user base
28. Are modular
Algorithms can be stacked or piped together
Typical scenarios: interpretation of unstructured data
29. Topic Analysis
Twitter Youtube Satellite Imagery
Computer Vision
Artificial Neural Networks
The future is building blocks…
31. 31
Use Cases #1: Birth of new algorithms – Nudity Detection
Algorithms Used
● Face Detection
● Nose Detection
● Skin Color Detection
Based on work from LaSalle University
32. 32
Use Case #2: Unsupervised content recommendation
Algorithms Used
● Breadth First Sitemap
● Analyze URL
● Keywords for Document Set
● Keyword Set Similarity
33. 33
Use Case #3: Video Recommender
Algorithms Used
● Get Links
● Download Youtube
● Speech 2 Text
● TF-IDF
● Keywords for Document Set
● Keyword Set Similarity
https://algorithmia.com/strata
37. • Reusable algorithms are now monetizable IP, driving choice and fostering reuse.
• Shortage of algorithm developers/ data scientist will lead to more generic model creation that
can scale to the demand
• “Bring your own data”
• Marketplaces will bring the benefits of the app economy to software development, lowering
software distribution costs and improving access to thousands of algorithms.
• Provides a new avenue where open-source and monetization can co-exist.
• Algorithm creators benefit from constant feedback from the algorithm callers – improving speed
of innovation and quality.
40. Algorithmia is the leading solution for finding, sharing, and using state-of-
the-art algorithms among complex teams with diverse technologies
40
16k+
developers
1.8k
algorithms
86
countries
41. ● Text Analysis summarizer, sentence tagger, profanity detection
● Machine Learning digit recognizer, recommendation engines
● Web crawler, scraper, pagerank, emailer, html to text
● Computer Vision image similarity, face detection, smile detection
● Audio & Video speech recognition, sound filters, file conversions
● Computation linear regression, spike detection, fourier filter
● Graph traveling salesman, maze generator, theta star
● Utilities parallel for-each, geographic distance, email validator
● Classifiers deep learning models
Sample algorithms
43. 43
Some predictions
• Algorithm marketplaces will be the driving force in lowering the bar for machine intelligence
adoption
• Enterprises will worry less about where their data is going in favor or being able to stay ahead
of their business as data collection gets unruly.
• Data locality concerns will be solved by ever moving compute clusters
• Move compute to the data not viceversa
• Algorithmic inception
• Algorithms that tune other algorithms -> the automated data scientist.
44. The future…is more autonomous
AutoML – Auto Machine Learning
Ensemble learning
Hyperparameter optimization