Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

idalab seminar #8 Data Science meets Art! - Predicting bestselling books

315 views

Published on

Content matters! Detecting the next bestselling novel that customers will love is a challenging task for many people within the publishing industry. Smart Data Science methods can be used to derive insights from an unknown fiction manuscript such as topics, sentiment or entities. Still, the question remains how the art of writing goes together with Machine Learning when it comes to predicting the chances of a text to become the next “Harry Potter” or future “Shades of Grey”. The QualiFiction team is happy to share their experience on these interesting challenges, both from a Data Science, Tech and business perspective.

Published in: Data & Analytics
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

idalab seminar #8 Data Science meets Art! - Predicting bestselling books

  1. 1. Agency for Data Science Machine learning & AI Mathematical modelling Data strategy Dr. Ralf Winkler Data Science meets Art! - Predicting bestselling books idalab seminar #8 | March 2nd 2018
  2. 2. Data Science meets Art! Predicting bestselling books Ralf Winkler QualiFiction GmbH Talk at IDALAB seminar #8 - Berlin
  3. 3. Facts & Figures about QualiFiction Based in Berlin & Hamburg Founded in August 2017 Targeting the German book market (first)
  4. 4. Vision & Mission “Sharing a common passion for literature, we want to foster creativity and quality as well as success of books during their whole lifecycle of being written, published and read. We find joy in innovative, data-driven methods to address the challenge of publishing books in a new way.”
  5. 5. Our team Gesa Schöning Ralf Winkler Anna Ira Hurnaus Managing Director Business Development & Sales Managing Director Product & Development Software Developer Backend
  6. 6. What this talk is about Engineering Setup & Tech Stack Statistical Learning Data Science meets customers Business Context The 'Why' and 'What'
  7. 7. Product(s) in a nutshell (Fiction) author ? New unseen manuscript Will this novel be successful? Does this novel fit our image/brand/strategy? Literature-Screening & Analytics (LISA) Bestseller Score & Sales F/C Publishing house “Topics”: 20% crime, 30% sex,... “Mean sentiment”: -0.3,... “Main characters: Father, son”,... 85% chance of becoming a bestseller. Expected 23k sold books within first year! ~30s. Analysis: Prediction:
  8. 8. (Current) Main Components of LiSA What are the main topics of that novel? Does this novel have a happy ending? What is the general mood of that document? How can the literary style of the novel be characterized? Who are the characters involved? Where does the action take place? Topics analysis & comparison Sentiment analysis & comparison Style & Statistics Entities
  9. 9. Topics Analysis via LDA: How to write novels Love ViolenceNature ice-creamTree ‘Topic cocktail’ for your Documentθ: ‘Word distribution of topic ‘Nature’φNature : Decide upon the most important 400 topics! For each topic create a word distribution Love Violence MusicNature ... Bird Preparation: Writing: Decide upon a cocktail of topics for your novel! Bird Sample from the topic cocktail θ to get a topic z. Randomly create words Family z = ”Nature’ Sample from the word distribution φNature to get a word w.
  10. 10. Topics Analysis via LDA: The Bayesian way of writing it. 𝜽 z 𝝋 w 40 0 100.000 𝜶 𝜷 “Sparse” Dirichlet prior “Sparse” Dirichlet prior Bayesian network for one novel with 100.000 words... … and for a whole bunch of 5000 documents 𝜽 z 𝝋 w 40 0 100.000 𝜶 𝜷 “Sparse” Dirichlet prior “Sparse” Dirichlet prior 5.000 This is a generative model LDA means: Turning this upside down, i.e., inferring the unknown parameters 𝜽 and 𝝋 from the observations w (the words). Mechanically done by Gibbs sampling, making inference on the hidden variable z, from which 𝜽 and 𝝋 can easily be derived .
  11. 11. Topics Analysis via LDA: From theory to practice Real world example: Top 50 of 𝝋123 Indentation indicates how specific this word is for that topic.
  12. 12. Sentiment Analysis via deep neural nets “Es war der Beginn einer wunderbaren Freundschaft.” Es war der Beginn einer wunderbaren Freundschaft 0.4 -0.2 0.34 0.9 - 0.34 3.2 0.4 -0.7 -0.6 0.42 0.74 - 0.32 0.3 -2.3 - 0.51 0.61 . . . ... ... ... ... 0.05 0.15 0.8 negative neutral positive LSTM(64) (incl. dropouts) Dropout(0.1) Dense with softmax activation . . . . . . Embedding Trained on 3000 manually labelled sentences. Done with Keras + Tensorflow. Much room for improvement! Gained from own Word2Vec model, trained on fiction documents. 300 dimensional vectors
  13. 13. The Bestseller Score: Features & Targets Content Features from LiSA Topics Sentiment slope Statistics and style Entities Meta Data Cover Author reputation Material Economics Price Marketing spendings Environment Trends Innovation component Cannibalisation Binary success indicator Sales Forecast Probability of the book entering (the/a) SPIEGEL bestseller list for n weeks or longer. Sold books during the first 12 month period after first publishing date. Features Targets
  14. 14. The Bestseller Score: Remarks & Challenges 1 Given the (yet) noisy data, the results are encouraging. 2 Challenge of imbalanced classes due to only a few bestselling books. 3 Classification performance (particularly given the imbalancedness problem) is neither easy to measure nor to communicate. 5 Work in progress: Choice of an adequate Machine learning philosophy Deep learning approach Neural feature creation on LiSA output Handcrafted feature engineering 4 Desire for (local) interpretability:, i.e. answering the question: “Why did the machine come to this conclusion?” Score Scoret Sent. . . . t Sent. . . . Mean sentiment Curvature Main topic . . . Score
  15. 15. Demo time
  16. 16. Thank you! Thank you for listening! We are hiring! jobs(at)qualifiction.de www.qualifiction.de Data Scientist Frontend Developer & UX

×