In this talk Gerbert will give an overview of Artificial Intelligence, outline the current state of the art in research and explain what it takes to actually do an AI project. Using practical cases and tools he will give you insight in the phases of an AI project and explain some of the problems you might encounter along the way and how you might be able to solve them.
3. Discover value Deploy solutions Accelerate teams
Compile a strategic
roadmap of viable
business cases
Implement scalable
solutions with maximum
business impact
Inherit skills & best
practices with
expert coaching
BrainCreators applies decades
of experience in artificial intelligence to
business challenges across all verticals
15. "AI is whatever hasn't been done yet"
Larry Tesler
the guy who invented copy &
paste while working at Xerox
Research (1973-1976)
16. What is AI ?
Deep learning
Neural Networks
Machine
learning
Robotics
Big data
Self-learning
Cognitive
modeling
Artificial Intelligence
Prediction
Recognition
Data Analytics
Classification
Semantic reasoning
Regression
Natural Language Processing
30. The source of most of these projects is
freely available..
..but usually the data is not!
31.
32. Unreasonable effectiveness of data
Source: Scaling to Very Very Large Corpora for Natural Language Disambiguation (2001 Microsoft)
Data beats
algorithms!
33. Unreasonable effectiveness of data
If the product is free, you are the product
● - free email, free image storage, free maps, free video storage, free search,
free mobile phone OS, free video calls, free translations
● - free social media channel, free messages, free image storage, free video
storage
● - shopping search data, voice data (Alexa), music & video taste data (Prime
Music / Video), fashion (Echo Look)
● - free search, Office 365 usage, professional network data (LinkedIn)
34. Unreasonable effectiveness of data
Source: The Internet of Things: Getting Ready to Embrace Its Impact on the Digital Economy, IDC 2016
35. So what if you are not
Source: https://techcrunch.com/2017/09/30/ai-hype-has-peaked-so-whats-next/
or
42. ● Starting point for any project
● Essential to know what is available for
commercial use and what not
● Good sources (list would be endless):
○ http://publicdata.eu/dataset.html (~48K
datasets)
○ https://catalog.data.gov/dataset (~197K
datasets)
○ https://datahub.io/dataset (~12K datasets)
○ http://lod-cloud.net/ (~1K datasets)
○ https://open.nasa.gov/open-data/
Publicly available Data
http://lod-cloud.net/versions/2017-02-20/lod.svg
45. Scraping Data
● Lots of relevant data can be found on
specific websites
● Structure of data available on target
sites allows for some level of automatic
tagging
74. 1011 neurons
104 synapses per neuron
1016 “operations” per second
100 peta flops?
Cortex: 2.500 cm 2, 2 mm thick
1.4 kg, 1.7 liters
250 million neurons per mm3 .
180,000 km of “wires”
25 Watts
Hardware: The Human Brain
80. Hardware: AI chips
Startup valuated at ~900M
Google TPU
Tensor Processing Unit
Intel Nervana
Neural Network Processor
Graphcore: IPU
Intelligent Processing Unit
~110M funding
Microsoft
Project Brainwave
81.
82. Containerization
IT needs control
● Portability (on-premise, cloud)
● Data Security / Network
Isolation
● Agility and elasticity
● Standardized environments
(dev, test, production)
● Higher resource utilization
Data Scientists needs flexibility
● Faster development lifecycles
● Different set of tools
● Different versions
● Default packaging
● Repeatable builds
84. Google Tensorflow
Pros:
● Computational graph abstraction.
● TensorBoard for visualization.
Cons:
● Computational graph abstraction.
● Lack of pre-trained models.
● Not completely open-source.
85. Microsoft Cognitive Toolkit
Pros:
● It is very flexible.
● Allows for distributed training.
● Supports C++, C#, Java, and Python.
● Significant Recurrent Neural Network
modelling capabilities
Cons:
● It is implemented in a new language,
Network Description Language (NDL).
● Lack of visualizations.
86. Berkeley Caffe
Pros:
● Supports Python and MATLAB
● Great performance.
● Allows for the training of models
without writing code.
Cons:
● Bad for recurrent networks.
● Not great with new architectures.
87. PyTorch
Pros:
● Great development and debugging
experience
● Love all things Pythonic
Cons:
● Lack of visualizations.
● Deployment
facebook, twitter, nvidia, salesforce
88. “We decided to marry PyTorch and Caffe2 which gives
the production-level readiness for PyTorch”
89. Things to consider
● Ecosystem and code availability
code examples, latest research available
● Research versus production
hardened for serving at scale
● Mobile support
fast kernels for ARM, Metal, etc.
● Language bindings
support for R, Scala, Java
● Programming style
imperative versus declarative
● Compute and memory footprint
platform scalability
● Scalability and performance
efficient multi-GPU and multi-instance support
91. ● Manual identification of address data for 15% of total volume
● 4% delivered to wrong address
● Geographical location of delivery points imprecise
● Delivery window too coarse
Before application of AI
Case: Logistics
92. ● Fuzzy logic address matching
● GPS delivery point prediction
● Time window estimation & optimisation
● Automated location mapping (inc. po-boxes)
● Trained on historic data and self learning
AI under the hood
Case: Logistics
93. ● Manual correction reduced to <2% of total volume
● Delivery failures reduced by 50%
● 2000 man hours saved per month
● Improved customer service through better time windows
Results
Case: Logistics
94. ● A major European steel producer
● Total of 7.1 million tonnes of steel
products in 2016
● High quality sheet and strip steel
● Automotive, packaging, and construction
sectors
General
Case: Steel sheet quality control
95. ● Kilometers of steel sheet each day
● Accurate quality assessment enables
more profitable trading
● Defects need to be detected to prevent
machine breaks
● Manual inspection supported by
automatic camera system
Initial Situation
Case: Steel sheet quality control
96. ● Infrared cameras inspect moving steel
sheet on conveyor belts
● Basic image processing detects regions
of interest
● Manual inspection often needed
● Accuracy can still be improved
Camera system
Case: Steel sheet quality control
97. ● Up to 50 different defect types
● 5 million (!) new images each day
● Currently only 25 thousand annotated
images available in total
● Severely imbalanced data sets
● Manual annotation is costly
Data sets
Case: Steel sheet quality control
98. ● Deep Learning for robust image
classification
● Ai & Active Learning approach for
efficient image annotation
● Integration in existing systems
● Knowledge transfer to customer’s own
tech team
● Already more than 90% accurate
Solution
Case: Steel sheet quality control
99. ● 46.000 patients affected by a stroke /
year in the NL
● Limited time to decide which hospital
to send patient to
● ~6 minutes for a skilled radiologist to
identify stroke location
● Small hospitals do not always have
trained radiologists on staff
Case: Stroke diagnostics
Initial situation
100. Case: Stroke diagnostics
Initial situation
● Imitate the process of expert
radiologists
● Train a deep neural network on 3D
volumes from left/right hemispheres
● Compare intensities and local brain
structures to discern affected from
healthy regions
101. Case: Stroke diagnostics
Result
● 95% classification accuracy in
detecting “blocks” with an occlusion
present
● Complete process in under 30 secs
● Visualization tools to localize area of
interest