SlideShare a Scribd company logo
Using Machine Learning for Automatic
Classification of Companies
IC-SDV 2018
Nice, France April 23rd, 2018
Aleksandar Kapisoda
• General Information
• Project Update
• Deeping the SEARCHCORPUS
• How to improve?
• What is What?
• Approaches, Learnings & Results
• Next steps
• Conclusions
Content
IC-SDV 2018. Aleksandar Kapisoda
Using Machine Learning for Automatic Classification of Companies
General Information
• Family-owned global corporation
• Founded 1885 in Ingelheim, Germany
• Focus on Human pharmaceuticals, Animal health and
biopharmaceutical contract manufacturing
• Around 45,700 employees worldwide
• Four R&D sites worldwide
• R&D expenditure of EUR 3.1 billion
• 17 production facilities (human pharmaceuticals) in
11 countries
• Net sales of around EUR 15.9 billion
• 143 affiliated companies worldwide
• Investment in tangible assets: EUR 645 million
Status: 31.12.2016
IC-SDV 2018. Aleksandar Kapisoda
Boehringer
Ingelheim Center
Our headquarter
in Ingelheim
General Information
IC-SDV 2018. Aleksandar Kapisoda
Scientific
Information
Center
• 1960 Central Library
• 1990 Scientific Information Services
• 2003 Scientific Library
• 2006 Scientific Information Center
Scope: Access to Knowledge
MainTasks:
• Global acquisition of external data
(scientific databases, literature)
• Scientific Information Sources & Consultanty
Project Update Biotech SEARCHCORPUS®
IC-SDV 2018. Aleksandar Kapisoda
Project Update
&
Status Quo
2015: Collecting company URLs
Quickly collected 80.000+ companies but also “unwanted”
targets like beauty farms, pharmacies..
2016 & 2017: Drastic improvement of data quality
Relevance Filtering
(Tagging with domain Ontologies &Taxonomies)
2018: Focusing the Search Space
Classification of the Search Space
Today > 45.000 biotech companies in 140 countries with more
than 8.5 Mio. Web pages
Usage:
As the word about the Biotech Company SEARCHCORPUS®
spreads within BI we get new diverse research targets:
• Competitive intelligence
• Business development & Licensing
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/
Data Lake is a storage repository that holds a vast amount of data until it is needed
Marketers with sparse data often
do not have enough data to create
measureable outcomes in
audience targeting through
modeling. Source: Chris O’Hara
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
SEARCHCORPUS is our storage repository that holds a vast amount of crawled data until it is needed
2015 - 2017
https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/
In our SEARCHCORPUS we too
much data to create measureable
outcomes in audience targeting
the right company.
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS – Data Lake
https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/
Second-party data is simply
someone else’s first-party data.
When relevant insights for is
added to a data lake, the result is a
more robust environment for
deeper data-led insights for both
targeting and analytics. Source:
Chris O’Hara.
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
Second-party data is in our case
an Information Scientist who has
domain expertise, she owns the
relevant insights. This
environment is more robust for
deeper data-led insights for
targeting the relevant company.
https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/
2018
Using the domain expertise
• Question:
What ist the Question?
• Content:
Which information wants our internal customer?
• Search:
What are the right keywords?
Where to search?
• Results
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
with the relevant insights
Information
Scientist
Status Quo
• Actually we have in our SEARCHCORPUS too much
data to create measureable outcomes in audience
targeting the right company.
• Big Pharma, Biotech Companies, CRO, Digital Health,
Life Science & University
Challenge:
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
Deeper data-led insights for targeting the relevant company.
.
Look for specific types of businesses
e.g.: Distinguish CROs from R&D startups
Status Quo
&
Challange
Focusing the Search Space
• We are looking for a company that licenses its process
• We are not looking for a service provider who could
produce a drug for you according to your procedure
(which you do not have).
• 40% CROs in this case wrongly positive.
• We exclude all unsuitable business models and we
gain 100% less false positives.
• This reduces the search time or optimizes the quality
of the results
GoalSearch Space
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
.
Deeper data-led insights for targeting the relevant company.
Improve significantly the Search Precision
in Biotech Company SEARCHCORPUS®
An abstract classification of the sources
e.q. Business Model (CRO)
IC-SDV 2018. Aleksandar Kapisoda
Deeping the SEARCHCORPUS
Deeper data-led insights for targeting the relevant company.
Goal
IC-SDV 2018. Aleksandar Kapisoda
How to improve the Search Precision?
https://sipmm.edu.sg/how-artificial-intelligence-revolutionize-procurement/
IC-SDV 2018. Aleksandar Kapisoda
What is What?
Artificial Intelligence (AI):
Machines in-built with Approach to achieve
Machine Learning (ML):
Machines with Human Brain Human Intelligence
Support Vector Machines
Deep Learning (DL):
Techniques to train the Machine’s Brain
Neural Networks are the foundation for DL
https://blogs.systweak.com/2016/12/artificial-learning-machine-learning-and-deep-learning-know-the-difference/
IC-SDV 2018. Aleksandar Kapisoda
Approaches, Learnings & Results
https://blogs.systweak.com/2016/12/artificial-learning-machine-learning-and-deep-learning-know-the-difference/
Classification
based on
website
structure
Deep Learning
Machine
Learning
Feed
Forward
Neural
Networks
Recurrent
Neural
Networks
Support
Vector
Machine
Choosing the training Algorithms
Training
Algorithms
Approaches, Learnings & Results
Corporate Standard Presentation 2017
Preparing the ground for Learning
Creating learning set:
Our Information Specialist categorized 50 company
websites into a small training sets
Focus areas:
• Big Pharma
• Biotech
• CROs
• Digital Health
• Life Science
• University
Training set
IC-SDV 2018. Aleksandar Kapisoda
IC-SDV 2018. Aleksandar Kapisoda
Artificial Neural Networks - A Pathway to Deep Learning
http://adventuresinmachinelearning.com/neural-networks-tutorial/
Artificial Neural Networks are computing systems
vaguely inspired by the biological neural networks
that constitute animal brains. Such systems "learn"
tasks by considering examples, generally without
task-specific programming.
Artificial Neural Network as a Black Box:
Deep Learning
Artificial Neural Networks
IC-SDV 2018. Aleksandar Kapisoda
Support Vector Machine – Automated Classification & Clustering
In machine learning, support vector machines which are
supervised learning models with associated learning
algorithms that analyze data used for classification and
regression analysis.
Word2vec is an efficient algorithm which uses a simple neural
network for unsupervised learning of word embedding on a
large, unlabeled corpus.
How does Automated Classification & Clustering works?
Consists of diving the items that make up a collection into
categories or classes
The goal is to accurately predict the target class for each
record in new data.
https://groups.google.com/forum/#!topic/gensim/EwK-6JgkWVI
Machine
Learning
Support
Vector
Machine
IC-SDV 2018. Aleksandar Kapisoda
Approaches, Learnings & Results
• Businesses of the same type may have very small (1 page) or
very large websites (1000s of pages)
• Looking at the link structure as an image does not reveal
anything about the type of business
Result:
• Our crawlers know the structure of the company websites
• Does the website structure reflect the type of business?
(Then we could e.g. use image classification algorithms)
Approach 1
Classification based on website structure
Classification
based on website
structure
IC-SDV 2018. Aleksandar Kapisoda
• Recognition rates not satisfying
• Training too expensive (lots of nodes due to large input vector)
Result:
Approaches, Learnings & Results
• Feed Forward Neural Networks andTerm Frequency
For the training of the Neural Network we used two-layered
Backpropagation networks, which are classic FFNN classifiers for
non-linear problems.
For this approach we converted the input data into a vector
using aTF-IDF1)
preprocessor trained on our large corpus.
1)
Term frequency – inverse document frequency identifies important terms by setting the number of
occurrences of a term in a document in relation to the number of documents in the corpus that contain
this term.
Apporach 2
Feed Forward Neural Networks
Deep Learning
IC-SDV 2018. Aleksandar Kapisoda
• No conversion, because of small training sets (50 samples
only)
• Recognition rate not satisfying
Result:
• Recurrent Neural Networks are good on learning sequences
RNNs can make decisions based on the text that is converted
into an input vector (Word2Vec / Doc2Vec) and takes the
sequence of the words into account, thus allowing to find
patterns in the content, memorize them and bind them to
specific classes.
Approaches, Learnings & Results
Apporach 3
Recurrent Neural Networks forText Classification
Deep Learning
IC-SDV 2018. Aleksandar Kapisoda
• For all 6 real world samples we got > 96% average recognition
rate
• Preparation and training is easy enough for data scientists to
create new classes on the fly without programming effort
Result:
SVMs can create non-linear classifiers by transforming the input
space onto a high dimensional one.Therefore they offer a good
compromise between complexity and performance
To obtain a reasonably sized input vector (remember, we classify
a whole website which may have several 100 MB of content), we
use the preprocessor from our FFNN approach with some magic.
Approaches, Learnings & Results
Final Approach
Support Vector Machine with Normalized Input
Machine
Learning
Next Steps
Integration Optimization More Sources Further development
The verified approach for
website classification
based on Support Vector
Machines will be
integrated into the Deep
SEARCH 9® development
environment.
The Biotech
SEARCHCORPUS® will be
further optimized by
dynamically deviding the
search space into clusters
that reflect the focus of all
current research targets.
Additional sources will be
crawled as new websites
can be classified before
they are added to the
SEARCHCORPUS®.
The classification
approach will be further
developed with the goal to
recognize companies
developing technologies
not being monitored so
far.
IC-SDV 2018. Aleksandar Kapisoda
Conlusions
IC-SDV 2018. Aleksandar Kapisoda
Acknowledgements
Klaus Kater
Developing Partner
Deep Search 9
IC-SDV 2018. Aleksandar Kapisoda
Dr. Gabriele Becher
Boehringer Ingelheim Pharma GmbH & Co. KG
Contact Information
Aleksandar Kapisoda
aleksandar.kapisoda@boehringer-ingelheim.com
Discovery Research Coordination – Scientific Information Center
IC-SDV 2018. Aleksandar Kapisoda
Thank You!
Questions?
For more information have a look at:
www.boehringer-ingelheim.com
www.opnme.com
© Boehringer Ingelheim International GmbH 2017

More Related Content

What's hot

ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe PatinformaticsICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
Dr. Haxel Consult
 
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
Dr. Haxel Consult
 
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
Dr. Haxel Consult
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
Dr. Haxel Consult
 
IC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePointIC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePoint
Dr. Haxel Consult
 
II-DV 2017: Averbis
II-DV 2017: AverbisII-DV 2017: Averbis
II-DV 2017: Averbis
Dr. Haxel Consult
 
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
Dr. Haxel Consult
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
Dr. Haxel Consult
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
Dr. Haxel Consult
 
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
Dr. Haxel Consult
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Pistoia Alliance
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
Dr. Haxel Consult
 
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
Dr. Haxel Consult
 
AI-SDV 2021 Biomax
AI-SDV 2021 BiomaxAI-SDV 2021 Biomax
AI-SDV 2021 Biomax
Dr. Haxel Consult
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
Institute of Contemporary Sciences
 
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
Dr. Haxel Consult
 
AI-SDV 2020: Biomax
AI-SDV 2020: BiomaxAI-SDV 2020: Biomax
AI-SDV 2020: Biomax
Dr. Haxel Consult
 
VALUENEX Singapore Seminar: Our Services (and Case Study)
VALUENEX Singapore Seminar: Our Services (and Case Study)VALUENEX Singapore Seminar: Our Services (and Case Study)
VALUENEX Singapore Seminar: Our Services (and Case Study)
VALUENEX
 
ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques
Dr. Haxel Consult
 

What's hot (20)

ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe PatinformaticsICIC 2013 Conference Proceedings Tony Trippe Patinformatics
ICIC 2013 Conference Proceedings Tony Trippe Patinformatics
 
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...
 
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
 
IC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePointIC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePoint
 
II-DV 2017: Averbis
II-DV 2017: AverbisII-DV 2017: Averbis
II-DV 2017: Averbis
 
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
IC-SDV 2018: Patrick Fievet (WIPO) Automatic Categorization of Patent Documen...
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
 
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
 
AI-SDV 2021 Biomax
AI-SDV 2021 BiomaxAI-SDV 2021 Biomax
AI-SDV 2021 Biomax
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
 
AI-SDV 2020: Biomax
AI-SDV 2020: BiomaxAI-SDV 2020: Biomax
AI-SDV 2020: Biomax
 
VALUENEX Singapore Seminar: Our Services (and Case Study)
VALUENEX Singapore Seminar: Our Services (and Case Study)VALUENEX Singapore Seminar: Our Services (and Case Study)
VALUENEX Singapore Seminar: Our Services (and Case Study)
 
ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques ICIC 2014 Patent Citation Analysis: Tools and Techniques
ICIC 2014 Patent Citation Analysis: Tools and Techniques
 

Similar to IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Automatic Classification of Companies

Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Databricks
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Dataconomy Media
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
PetteriTeikariPhD
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Robb Boyd
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
Jürgen Ambrosi
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
Inside Analysis
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
Lviv Startup Club
 
AI-SDV Meeting in Nice
AI-SDV Meeting in NiceAI-SDV Meeting in Nice
AI-SDV Meeting in Nice
Dr. Haxel Consult
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
Amazon Web Services
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data Warehouse
Anupam Sharma
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
Principled Technologies
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Marcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Alexey Rybakov
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Orange Belt - Session 3
AI Orange Belt - Session 3
AI Black Belt
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Why mTAB?
Why mTAB?Why mTAB?
Why mTAB?
Brad Hontz
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data Science
Paul Laughlin
 

Similar to IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Automatic Classification of Companies (20)

Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
AI-SDV Meeting in Nice
AI-SDV Meeting in NiceAI-SDV Meeting in Nice
AI-SDV Meeting in Nice
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data Warehouse
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Orange Belt - Session 3
AI Orange Belt - Session 3
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Why mTAB?
Why mTAB?Why mTAB?
Why mTAB?
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data Science
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
Federico Ast
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
APNIC
 
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENTUnlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
rajesh344555
 
DocSplit Subsequent Implementation Activation.pptx
DocSplit Subsequent Implementation Activation.pptxDocSplit Subsequent Implementation Activation.pptx
DocSplit Subsequent Implementation Activation.pptx
AmitTuteja9
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
Tarandeep Singh
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
GNAMBIKARAO
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
dtagbe
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
thezot
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Sarthak Sobti
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
APNIC
 
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
Web Inspire
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
Emre Gündoğdu
 
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
Febless Hernane
 
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENTUnlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
rajesh344555
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
narwatsonia7
 

Recently uploaded (15)

Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
 
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENTUnlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
 
DocSplit Subsequent Implementation Activation.pptx
DocSplit Subsequent Implementation Activation.pptxDocSplit Subsequent Implementation Activation.pptx
DocSplit Subsequent Implementation Activation.pptx
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
 
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
 
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE
 
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENTUnlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
Unlimited Short Call Girls Mumbai ✅ 9833363713 FULL CASH PAYMENT
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
 

IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Automatic Classification of Companies

  • 1. Using Machine Learning for Automatic Classification of Companies IC-SDV 2018 Nice, France April 23rd, 2018 Aleksandar Kapisoda
  • 2. • General Information • Project Update • Deeping the SEARCHCORPUS • How to improve? • What is What? • Approaches, Learnings & Results • Next steps • Conclusions Content IC-SDV 2018. Aleksandar Kapisoda Using Machine Learning for Automatic Classification of Companies
  • 3. General Information • Family-owned global corporation • Founded 1885 in Ingelheim, Germany • Focus on Human pharmaceuticals, Animal health and biopharmaceutical contract manufacturing • Around 45,700 employees worldwide • Four R&D sites worldwide • R&D expenditure of EUR 3.1 billion • 17 production facilities (human pharmaceuticals) in 11 countries • Net sales of around EUR 15.9 billion • 143 affiliated companies worldwide • Investment in tangible assets: EUR 645 million Status: 31.12.2016 IC-SDV 2018. Aleksandar Kapisoda Boehringer Ingelheim Center Our headquarter in Ingelheim
  • 4. General Information IC-SDV 2018. Aleksandar Kapisoda Scientific Information Center • 1960 Central Library • 1990 Scientific Information Services • 2003 Scientific Library • 2006 Scientific Information Center Scope: Access to Knowledge MainTasks: • Global acquisition of external data (scientific databases, literature) • Scientific Information Sources & Consultanty
  • 5. Project Update Biotech SEARCHCORPUS® IC-SDV 2018. Aleksandar Kapisoda Project Update & Status Quo 2015: Collecting company URLs Quickly collected 80.000+ companies but also “unwanted” targets like beauty farms, pharmacies.. 2016 & 2017: Drastic improvement of data quality Relevance Filtering (Tagging with domain Ontologies &Taxonomies) 2018: Focusing the Search Space Classification of the Search Space Today > 45.000 biotech companies in 140 countries with more than 8.5 Mio. Web pages Usage: As the word about the Biotech Company SEARCHCORPUS® spreads within BI we get new diverse research targets: • Competitive intelligence • Business development & Licensing
  • 6. IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/ Data Lake is a storage repository that holds a vast amount of data until it is needed Marketers with sparse data often do not have enough data to create measureable outcomes in audience targeting through modeling. Source: Chris O’Hara
  • 7. IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS SEARCHCORPUS is our storage repository that holds a vast amount of crawled data until it is needed 2015 - 2017 https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/ In our SEARCHCORPUS we too much data to create measureable outcomes in audience targeting the right company.
  • 8. IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS – Data Lake https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/ Second-party data is simply someone else’s first-party data. When relevant insights for is added to a data lake, the result is a more robust environment for deeper data-led insights for both targeting and analytics. Source: Chris O’Hara.
  • 9. IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS Second-party data is in our case an Information Scientist who has domain expertise, she owns the relevant insights. This environment is more robust for deeper data-led insights for targeting the relevant company. https://adexchanger.com/managing-the-data/deepening-data-lake-second-party-data-increases-ai-enterprises/ 2018
  • 10. Using the domain expertise • Question: What ist the Question? • Content: Which information wants our internal customer? • Search: What are the right keywords? Where to search? • Results IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS with the relevant insights Information Scientist
  • 11. Status Quo • Actually we have in our SEARCHCORPUS too much data to create measureable outcomes in audience targeting the right company. • Big Pharma, Biotech Companies, CRO, Digital Health, Life Science & University Challenge: IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS Deeper data-led insights for targeting the relevant company. . Look for specific types of businesses e.g.: Distinguish CROs from R&D startups Status Quo & Challange
  • 12. Focusing the Search Space • We are looking for a company that licenses its process • We are not looking for a service provider who could produce a drug for you according to your procedure (which you do not have). • 40% CROs in this case wrongly positive. • We exclude all unsuitable business models and we gain 100% less false positives. • This reduces the search time or optimizes the quality of the results GoalSearch Space IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS . Deeper data-led insights for targeting the relevant company.
  • 13. Improve significantly the Search Precision in Biotech Company SEARCHCORPUS® An abstract classification of the sources e.q. Business Model (CRO) IC-SDV 2018. Aleksandar Kapisoda Deeping the SEARCHCORPUS Deeper data-led insights for targeting the relevant company. Goal
  • 14. IC-SDV 2018. Aleksandar Kapisoda How to improve the Search Precision? https://sipmm.edu.sg/how-artificial-intelligence-revolutionize-procurement/
  • 15. IC-SDV 2018. Aleksandar Kapisoda What is What? Artificial Intelligence (AI): Machines in-built with Approach to achieve Machine Learning (ML): Machines with Human Brain Human Intelligence Support Vector Machines Deep Learning (DL): Techniques to train the Machine’s Brain Neural Networks are the foundation for DL https://blogs.systweak.com/2016/12/artificial-learning-machine-learning-and-deep-learning-know-the-difference/
  • 16. IC-SDV 2018. Aleksandar Kapisoda Approaches, Learnings & Results https://blogs.systweak.com/2016/12/artificial-learning-machine-learning-and-deep-learning-know-the-difference/ Classification based on website structure Deep Learning Machine Learning Feed Forward Neural Networks Recurrent Neural Networks Support Vector Machine Choosing the training Algorithms Training Algorithms
  • 17. Approaches, Learnings & Results Corporate Standard Presentation 2017 Preparing the ground for Learning Creating learning set: Our Information Specialist categorized 50 company websites into a small training sets Focus areas: • Big Pharma • Biotech • CROs • Digital Health • Life Science • University Training set IC-SDV 2018. Aleksandar Kapisoda
  • 18. IC-SDV 2018. Aleksandar Kapisoda Artificial Neural Networks - A Pathway to Deep Learning http://adventuresinmachinelearning.com/neural-networks-tutorial/ Artificial Neural Networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" tasks by considering examples, generally without task-specific programming. Artificial Neural Network as a Black Box: Deep Learning Artificial Neural Networks
  • 19. IC-SDV 2018. Aleksandar Kapisoda Support Vector Machine – Automated Classification & Clustering In machine learning, support vector machines which are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Word2vec is an efficient algorithm which uses a simple neural network for unsupervised learning of word embedding on a large, unlabeled corpus. How does Automated Classification & Clustering works? Consists of diving the items that make up a collection into categories or classes The goal is to accurately predict the target class for each record in new data. https://groups.google.com/forum/#!topic/gensim/EwK-6JgkWVI Machine Learning Support Vector Machine
  • 20. IC-SDV 2018. Aleksandar Kapisoda Approaches, Learnings & Results • Businesses of the same type may have very small (1 page) or very large websites (1000s of pages) • Looking at the link structure as an image does not reveal anything about the type of business Result: • Our crawlers know the structure of the company websites • Does the website structure reflect the type of business? (Then we could e.g. use image classification algorithms) Approach 1 Classification based on website structure Classification based on website structure
  • 21. IC-SDV 2018. Aleksandar Kapisoda • Recognition rates not satisfying • Training too expensive (lots of nodes due to large input vector) Result: Approaches, Learnings & Results • Feed Forward Neural Networks andTerm Frequency For the training of the Neural Network we used two-layered Backpropagation networks, which are classic FFNN classifiers for non-linear problems. For this approach we converted the input data into a vector using aTF-IDF1) preprocessor trained on our large corpus. 1) Term frequency – inverse document frequency identifies important terms by setting the number of occurrences of a term in a document in relation to the number of documents in the corpus that contain this term. Apporach 2 Feed Forward Neural Networks Deep Learning
  • 22. IC-SDV 2018. Aleksandar Kapisoda • No conversion, because of small training sets (50 samples only) • Recognition rate not satisfying Result: • Recurrent Neural Networks are good on learning sequences RNNs can make decisions based on the text that is converted into an input vector (Word2Vec / Doc2Vec) and takes the sequence of the words into account, thus allowing to find patterns in the content, memorize them and bind them to specific classes. Approaches, Learnings & Results Apporach 3 Recurrent Neural Networks forText Classification Deep Learning
  • 23. IC-SDV 2018. Aleksandar Kapisoda • For all 6 real world samples we got > 96% average recognition rate • Preparation and training is easy enough for data scientists to create new classes on the fly without programming effort Result: SVMs can create non-linear classifiers by transforming the input space onto a high dimensional one.Therefore they offer a good compromise between complexity and performance To obtain a reasonably sized input vector (remember, we classify a whole website which may have several 100 MB of content), we use the preprocessor from our FFNN approach with some magic. Approaches, Learnings & Results Final Approach Support Vector Machine with Normalized Input Machine Learning
  • 24. Next Steps Integration Optimization More Sources Further development The verified approach for website classification based on Support Vector Machines will be integrated into the Deep SEARCH 9® development environment. The Biotech SEARCHCORPUS® will be further optimized by dynamically deviding the search space into clusters that reflect the focus of all current research targets. Additional sources will be crawled as new websites can be classified before they are added to the SEARCHCORPUS®. The classification approach will be further developed with the goal to recognize companies developing technologies not being monitored so far. IC-SDV 2018. Aleksandar Kapisoda
  • 26. Acknowledgements Klaus Kater Developing Partner Deep Search 9 IC-SDV 2018. Aleksandar Kapisoda Dr. Gabriele Becher Boehringer Ingelheim Pharma GmbH & Co. KG
  • 27. Contact Information Aleksandar Kapisoda aleksandar.kapisoda@boehringer-ingelheim.com Discovery Research Coordination – Scientific Information Center IC-SDV 2018. Aleksandar Kapisoda
  • 28. Thank You! Questions? For more information have a look at: www.boehringer-ingelheim.com www.opnme.com © Boehringer Ingelheim International GmbH 2017