SlideShare a Scribd company logo
1 of 31
April 2018
Natural Language
Processing Visualisation
with Pynorama
Slavi Marinov, Head of Machine Learning
Technology – Man AHL
0
©Man2018
Plan for today
1
• NLP primer
• Pynorama demo
• How to plug in your own datasets
©Man2018
A typical text dataset
2Source: https://www.wsj.com/articles/developer-of-app-that-harvested-facebook-data-says-it-didnt-prove-useful-1524591027?mod=yahoo_hs&yptr=yahoo
©Man2018
Working with text data
3
• Text dataset = Documents + Metadata
• Can be quite large
• Metadata is typically wide (100+ columns)
• Many metrics are generated (even wider!)
• NLP models are typically pipelines:
• Raw document => Parsing => Tokenisation =>
Lemmatisation => Vectorisation => Classification
©Man2018
Typical understanding tasks
4
• Exploration: Browse through the metadata and identify documents of
interest
• Exploration: Review a (random) sample of documents
• Debugging: Trace a document through my pipeline
• Introspection: Review extreme samples (e.g. sentiment)
• Introspection: Review documents in certain cells of the confusion
matrix (e.g. classification)
©Man2018
The text visualisation ecosystem
5
• displacy: https://explosion.ai/demos/displacy
• pyLDAvis: https://github.com/bmabey/pyLDAvis
• facets: https://pair-code.github.io/facets/
• Embedding projector: https://projector.tensorflow.org/
• Prodigy: https://prodi.gy/docs/
• … and so many more
• seriously: look at http://textvis.lnu.se/
©Man2018
What I need
6
• Interactive UI to visualise my documents, metadata, and
NLP pipelines
• Fast, minimalistic, and easy to plug in
• Extensible and scalable
• Python*
An intern project we can open-source!
(Alex Wettig a.k.a. @CodeCreator)
©Man2018
7
Pynorama Demo
©Man2018
Pynorama core concepts [1/2]
8
Transforms Table Pipeline
Source: Man Group
©Man2018
Pynorama core concepts [2/2]
9
Pipeline Viewers
Source: Man Group
©Man2018
How to plug in?
10
• Implement 3 functions per dataset
• get_table(): The table is a container for the metadata
of the documents in the corpus (no text!)
• get_pipeline(): The processing graph
• get_record(): A document in a given stage of the
pipeline
©Man2018
11
• Return a Table instance
• Tables have a length and know how to transform
themselves
• Multiple backends: Pandas (of course!), MongoDB
(big datasets, lazy loading)
• Out-of-the-box transformations: sample, sort,
search, range query, remove nans, compute
histogram
get_table()
©Man2018
get_table()
12Source: Man Group
©Man2018
13
• Return a dict representation of the graph
• dict of nodes
• key is node name
• value contains the node type (associated with
frontend Viewer) and the list of parents
get_pipeline()
©Man2018
get_pipeline()
14Source: Man Group
©Man2018
15
• Tie table and pipeline together
• Receives document key and pipeline node name
• Return data for the frontend Viewer
• Off the shelf Viewers: raw, pdf, xml, json, tree
get_record(key, stage)
©Man2018
get_record(key, stage)
16Source: Man Group
©Man2018
Tying it all together
17Source: Man Group
©Man2018
For extra credit
18
• load(): One-off initialisation of the dataset
• get_config(): Frontend configuration
©Man2018
19
• Initialise the data once for the duration of the
server’s life
• Internally, we cache in Arctic:
https://github.com/manahl/arctic
• User can reload on-demand for dynamic datasets
load()
©Man2018
load()
20Source: Man Group
©Man2018
21
• Encapsulate list of frontend configs
• List of available transformations
• List of initially visible columns
• Name of column containing key (for
get_record)
• make_config() helper to… make it
get_config()
©Man2018
get_config()
22Source: Man Group
©Man2018
23
• Backend: Flask
• Frontend: React + Redux
• We didn’t even talk about the frontend today
(more code there than the backend!)
• Extensibility: transforms, viewers, tables, datasets
Pynorama is modular
©Man2018
24
• More tests
• More advanced frontend Viewers
• More advanced transformers
• Nice caching story
• Reload scheduling
• Connect with other open-source NLP visualisers, e.g.
displacy, pyLDAvis
Pynorama is open source ;)
Contributions welcome!
©Man2018
25
Questions?
©Man2018
26
• Transparent to the user
• functions that take a Table and a metadata dict
• e.g. for a sorting transform, the metadata may be
the column name and the direction
• Returns a new instance of the Table with the
transform applied
Appendix: A note on transforms
©Man2018
Transform example
27Source: Man Group
©Man2018
28
• Transparent to the user
• Persist the state of the frontend
• Which columns are visible
• Which transforms are applied
• Which viewers are open
• Multiple backends: JSON on disk, Mongo
Appendix: A note on sessions
©Man2018
Important Information
This information is communicated and/or distributed by the relevant Man entity identified below (collectively the “Company”) subject to the following conditions and restriction in their
respective jurisdictions.
Opinions expressed are those of the author and may not be shared by all personnel of Man Group plc (‘Man’). These opinions are subject to change without notice, are for information
purposes only and do not constitute an offer or invitation to make an investment in any financial instrument or in any product to which the Company and/or its affiliates provides investment
advisory or any other financial services. Any organisations, financial instrument or products described in this material are mentioned for reference purposes only which should not be
considered a recommendation for their purchase or sale. Neither the Company nor the authors shall be liable to any person for any action taken on the basis of the information provided.
Some statements contained in this material concerning goals, strategies, outlook or other non-historical matters may be forward-looking statements and are based on current indicators and
expectations. These forward-looking statements speak only as of the date on which they are made, and the Company undertakes no obligation to update or revise any forward-looking
statements. These forward-looking statements are subject to risks and uncertainties that may cause actual results to differ materially from those contained in the statements. The Company
and/or its affiliates may or may not have a position in any financial instrument mentioned and may or may not be actively trading in any such securities. This material is proprietary information
of the Company and its affiliates and may not be reproduced or otherwise disseminated in whole or in part without prior written consent from the Company. The Company believes the
content to be accurate. However accuracy is not warranted or guaranteed. The Company does not assume any liability in the case of incorrectly reported or incomplete information. Unless
stated otherwise all information is provided by the Company. Past performance is not indicative of future results.
Australia: To the extent this material is distributed in Australia it is communicated by Man Investments Australia Limited ABN 47 002 747 480 AFSL 240581, which is regulated by the
Australian Securities & Investments Commission (ASIC). This information has been prepared without taking into account anyone’s objectives, financial situation or needs.
European Economic Area: Unless indicated otherwise this material is communicated in the European Economic Area by Man Solutions Limited which is an investment company as defined
in section 833 of the Companies Act 2006 and is authorised and regulated by the UK Financial Conduct Authority (the “FCA”). Man Solutions Limited is registered in England and Wales
under number 3385362 and has its registered office at Riverbank House, 2 Swan Lane, London, EC4R 3AD, England. As an entity which is regulated by the FCA, Man Solutions Limited is
subject to regulatory requirements, which can be found at http://register.fca.org.uk.
Germany: To the extent this material is used in Germany, the communicating entity is Man (Europe) AG, which is authorised and regulated by the Liechtenstein Financial Market Authority
(FMA). Man (Europe) AG is registered in the Principality of Liechtenstein no. FL-0002.420.371-2. Man (Europe) AG is an associated participant in the investor compensation scheme, which
is operated by the Deposit Guarantee and Investor Compensation Foundation PCC (FL-0002.039.614-1) and corresponds with EU law. Further information is available on the Foundation's
website under HYPERLINK "http://www.eas-liechtenstein.li"www.eas-liechtenstein.li. This material is of a promotional nature.
Hong Kong: To the extent this material is distributed in Hong Kong, this material is communicated by Man Investments (Hong Kong) Limited and has not been reviewed by the Securities and
Futures Commission in Hong Kong. This material can only be communicated to intermediaries, and professional clients who are within one of the professional investor exemptions contained
in the Securities and Futures Ordinance and must not be relied upon by any other person(s).
29
©Man2018
Important Information
Liechtenstein: To the extent the material is used in Liechtenstein, the communicating entity is Man (Europe) AG, which is regulated by the Financial Market Authority Liechtenstein (FMA).
Man (Europe) AG is registered in the Principality of Liechtenstein no. FL-0002.420.371-2. Man (Europe) AG is an associated participant in the investor compensation scheme, which is
operated by the Deposit Guarantee and Investor Compensation Foundation PCC (FL-0002.039.614-1) and corresponds with EU law. Further information is available on the Foundation's
website under HYPERLINK "http://www.eas-liechtenstein.li"www.eas-liechtenstein.li.
Switzerland: To the extent this material is distributed in Switzerland, this material is communicated by Man Investments AG, which is regulated by the Swiss Financial Market Authority
FINMA.
United States: To the extent his material is distributed in the United States, it is communicated and distributed by Man Investments, Inc. (‘Man Investments’). Man Investments is registered
as a broker-dealer with the SEC and is a member of the Financial Industry Regulatory Authority (‘FINRA’). Man Investments is also a member of the Securities Investor Protection
Corporation (‘SIPC’). Man Investments is a wholly owned subsidiary of Man Group plc. The registration and memberships described above in no way imply a certain level of skill or expertise
or that the SEC, FINRA or the SIPC have endorsed Man Investments. Man Investments, 452 Fifth Avenue, 27th fl., New York, NY 10018.
This material is proprietary information and may not be reproduced or otherwise disseminated in whole or in part without prior written consent. Any data services and information available
from public sources used in the creation of this material are believed to be reliable. However accuracy is not warranted or guaranteed.
180717/RW/GL/R/W 30
continued

More Related Content

Similar to Natural Language Processing with Pynorama

Implementing And Managing A Multinational Privacy Program
Implementing And Managing A Multinational Privacy ProgramImplementing And Managing A Multinational Privacy Program
Implementing And Managing A Multinational Privacy Program
MSpadea
 

Similar to Natural Language Processing with Pynorama (20)

Fa group assignment (2017) dialog axiata plc
Fa group assignment (2017) dialog axiata plcFa group assignment (2017) dialog axiata plc
Fa group assignment (2017) dialog axiata plc
 
Sample Report: Ingenico Payment Services Company Profile 2015: Online Payment...
Sample Report: Ingenico Payment Services Company Profile 2015: Online Payment...Sample Report: Ingenico Payment Services Company Profile 2015: Online Payment...
Sample Report: Ingenico Payment Services Company Profile 2015: Online Payment...
 
Sample Report: Netbanx (an Optimal Payments Company) Company Profile 2015: On...
Sample Report: Netbanx (an Optimal Payments Company) Company Profile 2015: On...Sample Report: Netbanx (an Optimal Payments Company) Company Profile 2015: On...
Sample Report: Netbanx (an Optimal Payments Company) Company Profile 2015: On...
 
Sample Report: Wirecard Company Profile 2015: Online Payment Services
Sample Report: Wirecard Company Profile 2015: Online Payment ServicesSample Report: Wirecard Company Profile 2015: Online Payment Services
Sample Report: Wirecard Company Profile 2015: Online Payment Services
 
Presentation debt sales event Berlin Nov 2015
Presentation debt sales event Berlin Nov 2015Presentation debt sales event Berlin Nov 2015
Presentation debt sales event Berlin Nov 2015
 
Sample Report: Digital River Company Profile 2015: Online Payment Services
Sample Report: Digital River Company Profile 2015: Online Payment ServicesSample Report: Digital River Company Profile 2015: Online Payment Services
Sample Report: Digital River Company Profile 2015: Online Payment Services
 
Implementing And Managing A Multinational Privacy Program
Implementing And Managing A Multinational Privacy ProgramImplementing And Managing A Multinational Privacy Program
Implementing And Managing A Multinational Privacy Program
 
Leaf Group IR deck_01.03.17v1
Leaf Group IR deck_01.03.17v1Leaf Group IR deck_01.03.17v1
Leaf Group IR deck_01.03.17v1
 
apidays LIVE LONDON - Open Finance, it's already happening by Dave Tonge
apidays LIVE LONDON - Open Finance, it's already happening by Dave Tongeapidays LIVE LONDON - Open Finance, it's already happening by Dave Tonge
apidays LIVE LONDON - Open Finance, it's already happening by Dave Tonge
 
Ords Insights - Investment Strategy
Ords Insights - Investment StrategyOrds Insights - Investment Strategy
Ords Insights - Investment Strategy
 
Aly presentation nov 2017
Aly presentation nov 2017Aly presentation nov 2017
Aly presentation nov 2017
 
Investment funds & asset management in Poland 2018
Investment funds & asset management in Poland 2018Investment funds & asset management in Poland 2018
Investment funds & asset management in Poland 2018
 
Sample Report: Company Profiles of 10 Leading Online Payment Service Provider...
Sample Report: Company Profiles of 10 Leading Online Payment Service Provider...Sample Report: Company Profiles of 10 Leading Online Payment Service Provider...
Sample Report: Company Profiles of 10 Leading Online Payment Service Provider...
 
Sample Report: Asia-Pacific Online Payment Methods: First Half 2015
Sample Report: Asia-Pacific Online Payment Methods: First Half 2015Sample Report: Asia-Pacific Online Payment Methods: First Half 2015
Sample Report: Asia-Pacific Online Payment Methods: First Half 2015
 
Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015
 
Sample Report: Europe Online Payment Methods: First Half 2015
Sample Report: Europe Online Payment Methods: First Half 2015Sample Report: Europe Online Payment Methods: First Half 2015
Sample Report: Europe Online Payment Methods: First Half 2015
 
Rocket Internet Overview 2014
Rocket Internet Overview 2014Rocket Internet Overview 2014
Rocket Internet Overview 2014
 
Product Brochure: Chase Paymentech Company Profile 2015: Online Payment Services
Product Brochure: Chase Paymentech Company Profile 2015: Online Payment ServicesProduct Brochure: Chase Paymentech Company Profile 2015: Online Payment Services
Product Brochure: Chase Paymentech Company Profile 2015: Online Payment Services
 
Sample Report: Europe Cross-Border B2C E-Commerce 2015
Sample Report: Europe Cross-Border B2C E-Commerce 2015Sample Report: Europe Cross-Border B2C E-Commerce 2015
Sample Report: Europe Cross-Border B2C E-Commerce 2015
 
Product Brochure: Global Collect: Company Profile 2015: Online Payment Services
Product Brochure: Global Collect: Company Profile 2015: Online Payment ServicesProduct Brochure: Global Collect: Company Profile 2015: Online Payment Services
Product Brochure: Global Collect: Company Profile 2015: Online Payment Services
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 

Natural Language Processing with Pynorama

  • 1. April 2018 Natural Language Processing Visualisation with Pynorama Slavi Marinov, Head of Machine Learning Technology – Man AHL 0
  • 2. ©Man2018 Plan for today 1 • NLP primer • Pynorama demo • How to plug in your own datasets
  • 3. ©Man2018 A typical text dataset 2Source: https://www.wsj.com/articles/developer-of-app-that-harvested-facebook-data-says-it-didnt-prove-useful-1524591027?mod=yahoo_hs&yptr=yahoo
  • 4. ©Man2018 Working with text data 3 • Text dataset = Documents + Metadata • Can be quite large • Metadata is typically wide (100+ columns) • Many metrics are generated (even wider!) • NLP models are typically pipelines: • Raw document => Parsing => Tokenisation => Lemmatisation => Vectorisation => Classification
  • 5. ©Man2018 Typical understanding tasks 4 • Exploration: Browse through the metadata and identify documents of interest • Exploration: Review a (random) sample of documents • Debugging: Trace a document through my pipeline • Introspection: Review extreme samples (e.g. sentiment) • Introspection: Review documents in certain cells of the confusion matrix (e.g. classification)
  • 6. ©Man2018 The text visualisation ecosystem 5 • displacy: https://explosion.ai/demos/displacy • pyLDAvis: https://github.com/bmabey/pyLDAvis • facets: https://pair-code.github.io/facets/ • Embedding projector: https://projector.tensorflow.org/ • Prodigy: https://prodi.gy/docs/ • … and so many more • seriously: look at http://textvis.lnu.se/
  • 7. ©Man2018 What I need 6 • Interactive UI to visualise my documents, metadata, and NLP pipelines • Fast, minimalistic, and easy to plug in • Extensible and scalable • Python* An intern project we can open-source! (Alex Wettig a.k.a. @CodeCreator)
  • 9. ©Man2018 Pynorama core concepts [1/2] 8 Transforms Table Pipeline Source: Man Group
  • 10. ©Man2018 Pynorama core concepts [2/2] 9 Pipeline Viewers Source: Man Group
  • 11. ©Man2018 How to plug in? 10 • Implement 3 functions per dataset • get_table(): The table is a container for the metadata of the documents in the corpus (no text!) • get_pipeline(): The processing graph • get_record(): A document in a given stage of the pipeline
  • 12. ©Man2018 11 • Return a Table instance • Tables have a length and know how to transform themselves • Multiple backends: Pandas (of course!), MongoDB (big datasets, lazy loading) • Out-of-the-box transformations: sample, sort, search, range query, remove nans, compute histogram get_table()
  • 14. ©Man2018 13 • Return a dict representation of the graph • dict of nodes • key is node name • value contains the node type (associated with frontend Viewer) and the list of parents get_pipeline()
  • 16. ©Man2018 15 • Tie table and pipeline together • Receives document key and pipeline node name • Return data for the frontend Viewer • Off the shelf Viewers: raw, pdf, xml, json, tree get_record(key, stage)
  • 18. ©Man2018 Tying it all together 17Source: Man Group
  • 19. ©Man2018 For extra credit 18 • load(): One-off initialisation of the dataset • get_config(): Frontend configuration
  • 20. ©Man2018 19 • Initialise the data once for the duration of the server’s life • Internally, we cache in Arctic: https://github.com/manahl/arctic • User can reload on-demand for dynamic datasets load()
  • 22. ©Man2018 21 • Encapsulate list of frontend configs • List of available transformations • List of initially visible columns • Name of column containing key (for get_record) • make_config() helper to… make it get_config()
  • 24. ©Man2018 23 • Backend: Flask • Frontend: React + Redux • We didn’t even talk about the frontend today (more code there than the backend!) • Extensibility: transforms, viewers, tables, datasets Pynorama is modular
  • 25. ©Man2018 24 • More tests • More advanced frontend Viewers • More advanced transformers • Nice caching story • Reload scheduling • Connect with other open-source NLP visualisers, e.g. displacy, pyLDAvis Pynorama is open source ;) Contributions welcome!
  • 27. ©Man2018 26 • Transparent to the user • functions that take a Table and a metadata dict • e.g. for a sorting transform, the metadata may be the column name and the direction • Returns a new instance of the Table with the transform applied Appendix: A note on transforms
  • 29. ©Man2018 28 • Transparent to the user • Persist the state of the frontend • Which columns are visible • Which transforms are applied • Which viewers are open • Multiple backends: JSON on disk, Mongo Appendix: A note on sessions
  • 30. ©Man2018 Important Information This information is communicated and/or distributed by the relevant Man entity identified below (collectively the “Company”) subject to the following conditions and restriction in their respective jurisdictions. Opinions expressed are those of the author and may not be shared by all personnel of Man Group plc (‘Man’). These opinions are subject to change without notice, are for information purposes only and do not constitute an offer or invitation to make an investment in any financial instrument or in any product to which the Company and/or its affiliates provides investment advisory or any other financial services. Any organisations, financial instrument or products described in this material are mentioned for reference purposes only which should not be considered a recommendation for their purchase or sale. Neither the Company nor the authors shall be liable to any person for any action taken on the basis of the information provided. Some statements contained in this material concerning goals, strategies, outlook or other non-historical matters may be forward-looking statements and are based on current indicators and expectations. These forward-looking statements speak only as of the date on which they are made, and the Company undertakes no obligation to update or revise any forward-looking statements. These forward-looking statements are subject to risks and uncertainties that may cause actual results to differ materially from those contained in the statements. The Company and/or its affiliates may or may not have a position in any financial instrument mentioned and may or may not be actively trading in any such securities. This material is proprietary information of the Company and its affiliates and may not be reproduced or otherwise disseminated in whole or in part without prior written consent from the Company. The Company believes the content to be accurate. However accuracy is not warranted or guaranteed. The Company does not assume any liability in the case of incorrectly reported or incomplete information. Unless stated otherwise all information is provided by the Company. Past performance is not indicative of future results. Australia: To the extent this material is distributed in Australia it is communicated by Man Investments Australia Limited ABN 47 002 747 480 AFSL 240581, which is regulated by the Australian Securities & Investments Commission (ASIC). This information has been prepared without taking into account anyone’s objectives, financial situation or needs. European Economic Area: Unless indicated otherwise this material is communicated in the European Economic Area by Man Solutions Limited which is an investment company as defined in section 833 of the Companies Act 2006 and is authorised and regulated by the UK Financial Conduct Authority (the “FCA”). Man Solutions Limited is registered in England and Wales under number 3385362 and has its registered office at Riverbank House, 2 Swan Lane, London, EC4R 3AD, England. As an entity which is regulated by the FCA, Man Solutions Limited is subject to regulatory requirements, which can be found at http://register.fca.org.uk. Germany: To the extent this material is used in Germany, the communicating entity is Man (Europe) AG, which is authorised and regulated by the Liechtenstein Financial Market Authority (FMA). Man (Europe) AG is registered in the Principality of Liechtenstein no. FL-0002.420.371-2. Man (Europe) AG is an associated participant in the investor compensation scheme, which is operated by the Deposit Guarantee and Investor Compensation Foundation PCC (FL-0002.039.614-1) and corresponds with EU law. Further information is available on the Foundation's website under HYPERLINK "http://www.eas-liechtenstein.li"www.eas-liechtenstein.li. This material is of a promotional nature. Hong Kong: To the extent this material is distributed in Hong Kong, this material is communicated by Man Investments (Hong Kong) Limited and has not been reviewed by the Securities and Futures Commission in Hong Kong. This material can only be communicated to intermediaries, and professional clients who are within one of the professional investor exemptions contained in the Securities and Futures Ordinance and must not be relied upon by any other person(s). 29
  • 31. ©Man2018 Important Information Liechtenstein: To the extent the material is used in Liechtenstein, the communicating entity is Man (Europe) AG, which is regulated by the Financial Market Authority Liechtenstein (FMA). Man (Europe) AG is registered in the Principality of Liechtenstein no. FL-0002.420.371-2. Man (Europe) AG is an associated participant in the investor compensation scheme, which is operated by the Deposit Guarantee and Investor Compensation Foundation PCC (FL-0002.039.614-1) and corresponds with EU law. Further information is available on the Foundation's website under HYPERLINK "http://www.eas-liechtenstein.li"www.eas-liechtenstein.li. Switzerland: To the extent this material is distributed in Switzerland, this material is communicated by Man Investments AG, which is regulated by the Swiss Financial Market Authority FINMA. United States: To the extent his material is distributed in the United States, it is communicated and distributed by Man Investments, Inc. (‘Man Investments’). Man Investments is registered as a broker-dealer with the SEC and is a member of the Financial Industry Regulatory Authority (‘FINRA’). Man Investments is also a member of the Securities Investor Protection Corporation (‘SIPC’). Man Investments is a wholly owned subsidiary of Man Group plc. The registration and memberships described above in no way imply a certain level of skill or expertise or that the SEC, FINRA or the SIPC have endorsed Man Investments. Man Investments, 452 Fifth Avenue, 27th fl., New York, NY 10018. This material is proprietary information and may not be reproduced or otherwise disseminated in whole or in part without prior written consent. Any data services and information available from public sources used in the creation of this material are believed to be reliable. However accuracy is not warranted or guaranteed. 180717/RW/GL/R/W 30 continued

Editor's Notes

  1. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  2. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  3. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  4. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  5. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  6. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  7. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  8. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  9. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  10. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  11. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  12. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  13. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  14. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  15. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  16. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  17. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  18. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  19. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  20. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  21. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  22. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  23. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL
  24. Technology is 1 of the 3 core drivers of all recent successes in ML It requires long-term investments in specialised hardware, software, data sets, and talent Has led to the rise of tightly integrated groups and hybrid researchers Creates its own risks to be understood and mitigated Examples from our work at AHL