Your SlideShare is downloading. ×
Predictive modeling DBs
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Predictive modeling DBs


Published on

These are suggested data bases for predictive modeling certification at DataVita.

These are suggested data bases for predictive modeling certification at DataVita.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Predictive Modeling: Research Tasks Nilitis, LLC. © 2012
  • 2. 1. Netflix Database, Inc. - American provider of on-demand Internet streaming media andflat rate DVD-by-mailTraining data set:100,480,507 ratings480,189 users17,770 moviesData set entry:<user (ID), movie (ID), date of grade (yyyy-mm-dd), grade(1-5)>The BellKor Solution: Big Chaos Solution: Pragmatic Theory Solution: 2 Nilitis, LLC. © 2012
  • 3. 1. Netflix Database User-based collaborative filtering - Look for users who share the same rating patterns - Use the ratings from those users to calculate a prediction Item-based collaborative filtering - Build an item-item matrix determining relationships between pairs of items - Using the matrix, and the data on the current user, infer his taste…A note from the donor regarding Netflix data:"Thank you for your interest in the Netflix Prize dataset. The dataset is nolonger available.“Robust De-anonymization of Large Sparse Datasets 3 Nilitis, LLC. © 2012
  • 4. 2. EEG Database Data Set data from a large study to examine EEGcorrelates of genetic predisposition to alcoholism.64 electrodes placed on subjects scalps whichwere sampled at 256 Hz for 1 second.There were two groups of subjects: alcoholic andcontrol.Each subject was exposed to either a singlestimulus (S1) or to two stimuli (S1 and S2).122 subjects, each subject completed 120 trialswhere different stimuli were shown.EEG / ERP data available for free public download 4 Nilitis, LLC. © 2012
  • 5. 2. EEG Database Data SetControl Alcoholic example plots of a control and alcoholic subject - webpage of Lester IngberUse Ingber’s Canonical Momentum Indicator or smth. else? Or raw data? 5 Nilitis, LLC. © 2012
  • 6. 3. Berlin Database of Emotional Speech basic emotions: anger, joy,sadness, fear, disgust and boredom+ neutral speechTen professional native Germanactors (5 female and 5 male)simulated these emotions,producing 10 utterances (5 shortand 5 longer sentences)emotion was recognized by at least80 % of the listeners 6 Nilitis, LLC. © 2012
  • 7. 3. Berlin Database of Emotional SpeechVoice Emotion Recognition: Audio Feature Classifier Emotion Stream ExtractionFeature Extraction: “openEAR” settings from openEAR “emobase” config files and articles+ possibly to add some feature selection steps (state of the art–sequential feature selection)Classifier: state of the art – SVM with polynomial or RBF kernel(libSVM included into openEAR package) 7 Nilitis, LLC. © 2012
  • 8. 4. Wikipedia page-to-page link database pages: 5,716,808Total links: 130,160,392Google PageRank technology: 85% likelihood of choosing a random link from the page 15% likelihood of jumping to a page chosen at random from the entire web 8 Nilitis, LLC. © 2012
  • 9. 5. Detecting Malicious URLs about 2.4 million URLs 3.2 million featuresEstimating covariance matrix forhigh-dimensional dataLinear implementation of SVM(LIBLINEAR) 9 Nilitis, LLC. © 2012
  • 10. 5. Pseudo Periodic Synthetic Time Series Data Set + Branch and Bond evaluationAn Indexing Scheme for Fast Similarity Search in Large Time Series Databases 10 Nilitis, LLC. © 2012
  • 11. Other DatasetsIndividual household electric power consumption Data Set Marketing Data Set Flare Data Set Fires Data Set Data Set and Crime Data Set Income Data Set 11 Nilitis, LLC. © 2012