Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bionic Info Pro:
New Takes on an Old Theme
Machine Learning, Taxonomy Creation, Big Data,
Competitive Intelligence, and th...
Overview
• A little bit about Machine Learning
• A little bit about Taxonomies
• A little bit about Big Data
• A little bi...
NOT NEW:
Machine Learning for CI
Mena, Jesus. (1996). Data Mining for
Competitive Intelligence, Competitive
Intelligence R...
Refinement of Machine Learning
• Decision Trees/Classification
• Clustering
• Anomaly Detection
Refinement of Machine Learning
• Support Vector Machines-
– Predictive Classification
• Association Rules
– Marketbasket a...
Getting up to Speed
• http://efytimes.com
• 6 Video Tutorials and Playlists on
Machine Learning (January 2014)
NOT NEW: Taxonomies in
Information Retrieval
http://comsaad.blogspot.com/p/old-computer-photos.html
http://commons.wikimed...
Need for Taxonomic Structures
http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
NOT NEW: Datasets
http://www.conceptdraw.com/solution-park/resource/images/solutions/entity-relationship-diagram-(erd)/Dia...
Enter BIG DATA
http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
BigData Sources and AnalysisDataType Qualities Analysis Tools Result
Social Media Demographics API integration More profil...
Why “Concept Hierarchies” in
an Unstructured Environment?
Advantages
• When term is too low to appear in
frequent item/rulesets
• Create more interesting rules using
more general, ...
Disadvantages
• How low and how high in the hierarchy
do you set the threshold?
• Increased computation time
• If threshol...
Hybrid Taxonomic Development
• Understand your auto-classification
model
• Work with domain experts to create
basic taxono...
Domain Knowledge
and Thick Data
• Thick Data analysis primarily relies on human brain power to
process a small “N” while b...
Data Driven CI is Meaningless
Without Human/Domain
Knowledge
http://www.wired.com/2014/04/your-big-data-is-worthless-if-yo...
Recap
• Data Mining for CI is not new
• Refinement and Improvement
• Bigger, Weirder Data
Recap
• Where it’s at: Hybrid Schemas
• Thick Data, not just Big Data
• HUMAN ELEMENT IS ESSENTIAL
Questions?
Elaine Lasda Bergman
University at Albany
http://www.slideshare.net/librarian68
elasdabergman@albany.edu
@Elain...
Upcoming SlideShare
Loading in …5
×

Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

693 views

Published on

Presentation for Special Libraries Association on machine assisted taxonomy creation and the human element.

Published in: Education, Technology
  • Be the first to like this

Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

  1. 1. Bionic Info Pro: New Takes on an Old Theme Machine Learning, Taxonomy Creation, Big Data, Competitive Intelligence, and the Human Element Elaine M. Lasda Bergman Annual Conference Special Libraries Association Vancouver, BC, Canada Monday, June 9, 2014
  2. 2. Overview • A little bit about Machine Learning • A little bit about Taxonomies • A little bit about Big Data • A little bit about Hybrid Techniques
  3. 3. NOT NEW: Machine Learning for CI Mena, Jesus. (1996). Data Mining for Competitive Intelligence, Competitive Intelligence Review, 7(4):18-25.
  4. 4. Refinement of Machine Learning • Decision Trees/Classification • Clustering • Anomaly Detection
  5. 5. Refinement of Machine Learning • Support Vector Machines- – Predictive Classification • Association Rules – Marketbasket analysis • Natural Language Processing – Sentiment Analysis
  6. 6. Getting up to Speed • http://efytimes.com • 6 Video Tutorials and Playlists on Machine Learning (January 2014)
  7. 7. NOT NEW: Taxonomies in Information Retrieval http://comsaad.blogspot.com/p/old-computer-photos.html http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg
  8. 8. Need for Taxonomic Structures http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
  9. 9. NOT NEW: Datasets http://www.conceptdraw.com/solution-park/resource/images/solutions/entity-relationship-diagram-(erd)/Diagramming-Crow's-Foot-ERD-Sample60.png
  10. 10. Enter BIG DATA http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
  11. 11. BigData Sources and AnalysisDataType Qualities Analysis Tools Result Social Media Demographics API integration More profiles of like- minded users “Social Influencers” User Reviews NLP, Text Analysis Sentiment readings “Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior patterns SaaS Cloud/Web-based/Subscription software Dist. data integration/in-memory caching technology/API integration Usage behavior patterns, customer data, etc. Public Data e.g., Amazon Data Market, WorldBank, Wikipedia All above (depends on data structure) Depends on Dataset (and there are LOTS of them!) Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations, needles in haystacks Data Warehouses Internal transactional data Likely same as above Correlations, marketbasket, etc. NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and patterns In-Stream Monitoring Network traffic (streaming videos, system outages) Packet evaluation, distributed query processing Network/Stream usage patterns Legacy Data Usually PDFs & Documents/SemiStructured Transformation tools(eg, Xenos d2e) + above Depends on content (could be all) http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/
  12. 12. Why “Concept Hierarchies” in an Unstructured Environment?
  13. 13. Advantages • When term is too low to appear in frequent item/rulesets • Create more interesting rules using more general, aggregated concepts [DVD, wheat bread, home electronics, electronitcs, food] Kumar, T.S. (2005) Introduction to Data Science
  14. 14. Disadvantages • How low and how high in the hierarchy do you set the threshold? • Increased computation time • If threshold is to high, redundant rules for more specific terms can be summarized by rules using more general terms
  15. 15. Hybrid Taxonomic Development • Understand your auto-classification model • Work with domain experts to create basic taxonomy • Test Taxonomy in the Model • Rinse, repeat Wendy Pohs,ASIS&T Bulletin 12/1/13
  16. 16. Domain Knowledge and Thick Data • Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. • Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning. http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)
  17. 17. Data Driven CI is Meaningless Without Human/Domain Knowledge http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real- world/
  18. 18. Recap • Data Mining for CI is not new • Refinement and Improvement • Bigger, Weirder Data
  19. 19. Recap • Where it’s at: Hybrid Schemas • Thick Data, not just Big Data • HUMAN ELEMENT IS ESSENTIAL
  20. 20. Questions? Elaine Lasda Bergman University at Albany http://www.slideshare.net/librarian68 elasdabergman@albany.edu @ElaineLibrarian

×