SlideShare a Scribd company logo
1 of 23
NLP in action
Agenda
Q U I C K I N T R O I N M A C H I N E L E A R N I N G
C O M P L E X I T Y O F W O R K I N G W I T H L A N G U A G E
O V E R V I E W O F N L P C O N C E P T S …
A N D T H E I R A P P L I C A T I O N S I N P R O D U C T I O N
F U T U R E
2
General ML approach
• How to formalize some process of decision making in mathematical
function
• Reduce the error rate of this function
• Formalization and minimization
3
General ML approach
• Math function requires constant dimension
• We can breakdown image in sequence of brightness levels from 0
to 255
• How is it possible to breakdown language?
F O R M A L I Z AT I O N
4
General ML approach
Find a cats on
a pictures
Each picture is
a equation in
terms of filter
theory math
Compose a
equations into
a equations
system
Resolve
equation
system to get
cat finding
framework
F O R M A L I Z AT I O N
5
General ML approach
x - 2y = 3
x + y = 6
3y = 3 y = 1 x = 5
A N A LY T I C A L S O LU T I O N
6
Brightness levels of picture from samples and cat position are in red
Neural network coefficients are in blue
Pixel
1
Pixel
2
The
cat
1 -2 3
1 1 6
General ML approach
x - 2y = 3
x + y = 6
You are really smart
human being, too
smart to do analytics
x = 5
y = 1
S TO C H A S T I C S O LU T I O N
7
Is it possible to breakdown language?
• Language is spoken sounds or written elements
• Vast number of different spoken languages with unique set of sounds.
• Long list of written traditions with local modifications. Signs are used for sounds, words, expressions
• No common interface at all!
F O R M A L I Z AT I O N
8
Focus on business goal, let leave linguistics to linguists
• Only written text
• English only alphabet (or Latin only for example)
F O R M A L I Z AT I O N
9
One-hot encoding
A B C D E F … K L M … O P Q … Food Company
A 1 0 0 0 0 0 0 0 0 0 0 0 0 1
P 0 0 0 0 0 0 0 0 0 0 1 0 0 1
P 0 0 0 0 0 0 0 0 0 0 1 0 0 1
L 0 0 0 0 0 0 0 1 0 0 0 0 0 1
E 0 0 0 0 1 0 0 0 0 0 0 0 0 1
10
Unstructured data extraction: Sequence neural network
I T I S N O T T R U E T H AT D R U N K D R I V I N G I S S AV E
11
Data labeling
• Data labeling is straight forward in some cases, but
C H A L L E N G E S A N D A P P R O A C H E S
12
Data labeling
• Data labeling is straight forward in some cases, but
• Data might be sensitive
• Data might require rare competency for exotic languages, or
complex business domain
• For our synthetic labeled data and production data are the same,
lets do feed back loop then. Retrain model automatically through
the time from user interaction with data.
C H A L L E N G E S A N D A P P R O A C H E S
13
Pretrained models and fine-tunning boom
• Large models pretrained on huge datasets, prepared through
expensive training process
• Tech giants train, you use
• Let’s download and apply word2vec to improve semantic
understanding
M I N I M I Z AT I O N
14
Segmentation, lemmatization, POS
G O O G L E C LO U D N L P D E M O
15
Compose it together
Segmentation Lemmatization
Words
embeddings
Sequential
neural network
with memory
unit
Make prediction
of a word
16
Semantics summarization
17
With semantics insights we can…
• Prioritize emails by sentiment or topic from sentiment summary
• Extract legal and personal names to process email automatically
and prioritize it. Or even register application in internal system
automatically
• Predict business domain from text summary and automatically
forward insurance compensation claim to proper department
S U M U P T E X T S E M A N T I C S A N D C AT E G O R I Z E
18
With semantics insights we can…
S U M U P W O R D I N S E N T E N C E A N D D O S E M A N T I C S E A R C H , J U S T F I N E - T U N E I T F O R YO U R D O M A I N
19
With semantics insights we can…
E N C O D E W O R D S I N V E C TO R S A N D D E C O D E S E N T E N C E I N A N O T H E R L A N G UA G E O R S T Y L E
20
Transformers age
• State of the art
• Increased ability to summarize semantics of text
• Generate text from metadata – opposite of summarization
• Google BERT, XLNet, RoBERTa
• Our Invoice extraction may be implemented with FastText
embeddings and BERT classification instead of one-hot encoding +
LSTM to become a state of the art data extraction algorithm
• Increased summarization capabilities boosted tasks that were
discussed previously
• New text generation feature opened intellectual Q&A and Chatbots
• Auto-ML features from cloud provides help to build ML solution
without ML knowledges
• Applied in COVID-19 researches to help pharma companies do
master data management
21
GPT-3 is coming…
• Largest transformer ever - between $11.5 million and $27.6 million,
plus the overhead of parallel GPUs. Infrastructure only
• Made quite a noise with public API. Demo version of the model
provided a lot of offensive texts and articles. Philosophical
discussions around control over AI became more adequate.
• I beta now, release date is unclear. Microsoft acquired a license and
will provide GPT-3 power by subscription.
22
Summary
• Machine Learning is formalization and minimization
• Production Machine Learning dramatically depends on business insights and constraints
• Modern NLP concepts are combinations of encoding, summarization and decoding
• Let’s look after GPT-3 and stay safe
23

More Related Content

Similar to NLP in action talk

DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia
 
2017 How Deep Learning Changes the Design Process (1)
2017 How Deep Learning Changes the Design Process (1)2017 How Deep Learning Changes the Design Process (1)
2017 How Deep Learning Changes the Design Process (1)Alexander Meinhardt
 
2017 How Deep Learning Changes the Design Process (Update)
2017 How Deep Learning Changes the Design Process (Update)2017 How Deep Learning Changes the Design Process (Update)
2017 How Deep Learning Changes the Design Process (Update)Alexander Meinhardt
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
 
Wiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingWiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingFlorent Solt
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactorcklosowski
 
Plenary Talk from GeCoWest ~ Best of Breed for Geospatial
Plenary Talk from GeCoWest ~ Best of Breed for GeospatialPlenary Talk from GeCoWest ~ Best of Breed for Geospatial
Plenary Talk from GeCoWest ~ Best of Breed for GeospatialMichael Terner
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017Amazon Web Services
 
1Mobile SystemsChapter Extension 3ce03-2Stud
1Mobile SystemsChapter Extension 3ce03-2Stud1Mobile SystemsChapter Extension 3ce03-2Stud
1Mobile SystemsChapter Extension 3ce03-2StudAnastaciaShadelb
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at ScaleDavid Simons
 
Getting Started with Big Data and Splunk
Getting Started with Big Data and SplunkGetting Started with Big Data and Splunk
Getting Started with Big Data and SplunkTom Chavez
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...The Data Touch
 
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Lluis Carreras
 
Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceChristy Abraham Joy
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]New Relic
 
What Nature Can Tell Us About IoT Security at Scale
What Nature Can Tell Us About IoT Security at ScaleWhat Nature Can Tell Us About IoT Security at Scale
What Nature Can Tell Us About IoT Security at ScaleColloqueRISQ
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]New Relic
 

Similar to NLP in action talk (20)

DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
2017 How Deep Learning Changes the Design Process (1)
2017 How Deep Learning Changes the Design Process (1)2017 How Deep Learning Changes the Design Process (1)
2017 How Deep Learning Changes the Design Process (1)
 
2017 How Deep Learning Changes the Design Process (Update)
2017 How Deep Learning Changes the Design Process (Update)2017 How Deep Learning Changes the Design Process (Update)
2017 How Deep Learning Changes the Design Process (Update)
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
 
Wiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingWiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturing
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactor
 
CollectiWise
CollectiWiseCollectiWise
CollectiWise
 
Plenary Talk from GeCoWest ~ Best of Breed for Geospatial
Plenary Talk from GeCoWest ~ Best of Breed for GeospatialPlenary Talk from GeCoWest ~ Best of Breed for Geospatial
Plenary Talk from GeCoWest ~ Best of Breed for Geospatial
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
 
1Mobile SystemsChapter Extension 3ce03-2Stud
1Mobile SystemsChapter Extension 3ce03-2Stud1Mobile SystemsChapter Extension 3ce03-2Stud
1Mobile SystemsChapter Extension 3ce03-2Stud
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Getting Started with Big Data and Splunk
Getting Started with Big Data and SplunkGetting Started with Big Data and Splunk
Getting Started with Big Data and Splunk
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...
Measure camp 2021_my_journey_learning_machine_learning_the_good_the_bad_and_t...
 
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
 
Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligence
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
 
What Nature Can Tell Us About IoT Security at Scale
What Nature Can Tell Us About IoT Security at ScaleWhat Nature Can Tell Us About IoT Security at Scale
What Nature Can Tell Us About IoT Security at Scale
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

NLP in action talk

  • 2. Agenda Q U I C K I N T R O I N M A C H I N E L E A R N I N G C O M P L E X I T Y O F W O R K I N G W I T H L A N G U A G E O V E R V I E W O F N L P C O N C E P T S … A N D T H E I R A P P L I C A T I O N S I N P R O D U C T I O N F U T U R E 2
  • 3. General ML approach • How to formalize some process of decision making in mathematical function • Reduce the error rate of this function • Formalization and minimization 3
  • 4. General ML approach • Math function requires constant dimension • We can breakdown image in sequence of brightness levels from 0 to 255 • How is it possible to breakdown language? F O R M A L I Z AT I O N 4
  • 5. General ML approach Find a cats on a pictures Each picture is a equation in terms of filter theory math Compose a equations into a equations system Resolve equation system to get cat finding framework F O R M A L I Z AT I O N 5
  • 6. General ML approach x - 2y = 3 x + y = 6 3y = 3 y = 1 x = 5 A N A LY T I C A L S O LU T I O N 6 Brightness levels of picture from samples and cat position are in red Neural network coefficients are in blue Pixel 1 Pixel 2 The cat 1 -2 3 1 1 6
  • 7. General ML approach x - 2y = 3 x + y = 6 You are really smart human being, too smart to do analytics x = 5 y = 1 S TO C H A S T I C S O LU T I O N 7
  • 8. Is it possible to breakdown language? • Language is spoken sounds or written elements • Vast number of different spoken languages with unique set of sounds. • Long list of written traditions with local modifications. Signs are used for sounds, words, expressions • No common interface at all! F O R M A L I Z AT I O N 8
  • 9. Focus on business goal, let leave linguistics to linguists • Only written text • English only alphabet (or Latin only for example) F O R M A L I Z AT I O N 9
  • 10. One-hot encoding A B C D E F … K L M … O P Q … Food Company A 1 0 0 0 0 0 0 0 0 0 0 0 0 1 P 0 0 0 0 0 0 0 0 0 0 1 0 0 1 P 0 0 0 0 0 0 0 0 0 0 1 0 0 1 L 0 0 0 0 0 0 0 1 0 0 0 0 0 1 E 0 0 0 0 1 0 0 0 0 0 0 0 0 1 10
  • 11. Unstructured data extraction: Sequence neural network I T I S N O T T R U E T H AT D R U N K D R I V I N G I S S AV E 11
  • 12. Data labeling • Data labeling is straight forward in some cases, but C H A L L E N G E S A N D A P P R O A C H E S 12
  • 13. Data labeling • Data labeling is straight forward in some cases, but • Data might be sensitive • Data might require rare competency for exotic languages, or complex business domain • For our synthetic labeled data and production data are the same, lets do feed back loop then. Retrain model automatically through the time from user interaction with data. C H A L L E N G E S A N D A P P R O A C H E S 13
  • 14. Pretrained models and fine-tunning boom • Large models pretrained on huge datasets, prepared through expensive training process • Tech giants train, you use • Let’s download and apply word2vec to improve semantic understanding M I N I M I Z AT I O N 14
  • 15. Segmentation, lemmatization, POS G O O G L E C LO U D N L P D E M O 15
  • 16. Compose it together Segmentation Lemmatization Words embeddings Sequential neural network with memory unit Make prediction of a word 16
  • 18. With semantics insights we can… • Prioritize emails by sentiment or topic from sentiment summary • Extract legal and personal names to process email automatically and prioritize it. Or even register application in internal system automatically • Predict business domain from text summary and automatically forward insurance compensation claim to proper department S U M U P T E X T S E M A N T I C S A N D C AT E G O R I Z E 18
  • 19. With semantics insights we can… S U M U P W O R D I N S E N T E N C E A N D D O S E M A N T I C S E A R C H , J U S T F I N E - T U N E I T F O R YO U R D O M A I N 19
  • 20. With semantics insights we can… E N C O D E W O R D S I N V E C TO R S A N D D E C O D E S E N T E N C E I N A N O T H E R L A N G UA G E O R S T Y L E 20
  • 21. Transformers age • State of the art • Increased ability to summarize semantics of text • Generate text from metadata – opposite of summarization • Google BERT, XLNet, RoBERTa • Our Invoice extraction may be implemented with FastText embeddings and BERT classification instead of one-hot encoding + LSTM to become a state of the art data extraction algorithm • Increased summarization capabilities boosted tasks that were discussed previously • New text generation feature opened intellectual Q&A and Chatbots • Auto-ML features from cloud provides help to build ML solution without ML knowledges • Applied in COVID-19 researches to help pharma companies do master data management 21
  • 22. GPT-3 is coming… • Largest transformer ever - between $11.5 million and $27.6 million, plus the overhead of parallel GPUs. Infrastructure only • Made quite a noise with public API. Demo version of the model provided a lot of offensive texts and articles. Philosophical discussions around control over AI became more adequate. • I beta now, release date is unclear. Microsoft acquired a license and will provide GPT-3 power by subscription. 22
  • 23. Summary • Machine Learning is formalization and minimization • Production Machine Learning dramatically depends on business insights and constraints • Modern NLP concepts are combinations of encoding, summarization and decoding • Let’s look after GPT-3 and stay safe 23