Your SlideShare is downloading. ×
  • Like
Predictive modeling DBs
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Predictive modeling DBs

  • 1,178 views
Published

These are suggested data bases for predictive modeling certification at DataVita.

These are suggested data bases for predictive modeling certification at DataVita.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,178
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Predictive Modeling: Research Tasks Nilitis, LLC. © 2012
  • 2. 1. Netflix Databasehttp://cms.uhd.edu/faculty/chenp/class/4319/project/netflixfiles.htmlNetflix, Inc. - American provider of on-demand Internet streaming media andflat rate DVD-by-mailTraining data set:100,480,507 ratings480,189 users17,770 moviesData set entry:<user (ID), movie (ID), date of grade (yyyy-mm-dd), grade(1-5)>The BellKor Solution:http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdfThe Big Chaos Solution:http://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdfThe Pragmatic Theory Solution:http://www.netflixprize.com/assets/GrandPrize2009_BPC_PragmaticTheory.pdf 2 Nilitis, LLC. © 2012
  • 3. 1. Netflix Database User-based collaborative filtering - Look for users who share the same rating patterns - Use the ratings from those users to calculate a prediction Item-based collaborative filtering - Build an item-item matrix determining relationships between pairs of items - Using the matrix, and the data on the current user, infer his taste…A note from the donor regarding Netflix data:"Thank you for your interest in the Netflix Prize dataset. The dataset is nolonger available.“Robust De-anonymization of Large Sparse Datasetshttp://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf 3 Nilitis, LLC. © 2012
  • 4. 2. EEG Database Data Set http://archive.ics.uci.edu/ml/datasets/EEG+DatabaseThis data from a large study to examine EEGcorrelates of genetic predisposition to alcoholism.64 electrodes placed on subjects scalps whichwere sampled at 256 Hz for 1 second.There were two groups of subjects: alcoholic andcontrol.Each subject was exposed to either a singlestimulus (S1) or to two stimuli (S1 and S2).122 subjects, each subject completed 120 trialswhere different stimuli were shown.EEG / ERP data available for free public downloadhttp://sccn.ucsd.edu/~arno/fam2data/publicly_available_EEG_data.html 4 Nilitis, LLC. © 2012
  • 5. 2. EEG Database Data SetControl Alcoholic example plots of a control and alcoholic subjecthttp://www.ingber.com/ - webpage of Lester IngberUse Ingber’s Canonical Momentum Indicator or smth. else? Or raw data? 5 Nilitis, LLC. © 2012
  • 6. 3. Berlin Database of Emotional Speech http://database.syntheticspeech.de/6 basic emotions: anger, joy,sadness, fear, disgust and boredom+ neutral speechTen professional native Germanactors (5 female and 5 male)simulated these emotions,producing 10 utterances (5 shortand 5 longer sentences)emotion was recognized by at least80 % of the listeners 6 Nilitis, LLC. © 2012
  • 7. 3. Berlin Database of Emotional SpeechVoice Emotion Recognition: Audio Feature Classifier Emotion Stream ExtractionFeature Extraction: “openEAR”http://sourceforge.net/projects/openart/?source=dlpTake settings from openEAR “emobase” config files and articles+ possibly to add some feature selection steps (state of the art–sequential feature selection)Classifier: state of the art – SVM with polynomial or RBF kernel(libSVM included into openEAR package) 7 Nilitis, LLC. © 2012
  • 8. 4. Wikipedia page-to-page link database http://haselgrove.id.au/wikipedia.htmTotal pages: 5,716,808Total links: 130,160,392Google PageRank technology:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427 85% likelihood of choosing a random link from the page 15% likelihood of jumping to a page chosen at random from the entire web 8 Nilitis, LLC. © 2012
  • 9. 5. Detecting Malicious URLs http://sysnet.ucsd.edu/projects/url/ about 2.4 million URLs 3.2 million featuresEstimating covariance matrix forhigh-dimensional dataLinear implementation of SVM(LIBLINEAR) 9 Nilitis, LLC. © 2012
  • 10. 5. Pseudo Periodic Synthetic Time Series Data Set http://archive.ics.uci.edu/ml/datasets/Pseudo+Periodic+Synthetic+Time+Series + Branch and Bond evaluationAn Indexing Scheme for Fast Similarity Search in Large Time Series Databaseshttp://www.cs.rutgers.edu/~pazzani/Publications/ssdb99.pdf 10 Nilitis, LLC. © 2012
  • 11. Other DatasetsIndividual household electric power consumption Data Set http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumptionBank Marketing Data Set http://archive.ics.uci.edu/ml/datasets/Bank+MarketingSolar Flare Data Set http://archive.ics.uci.edu/ml/datasets/Solar+FlareForest Fires Data Set http://archive.ics.uci.edu/ml/datasets/Forest+FiresArrhythmia Data Set http://archive.ics.uci.edu/ml/datasets/ArrhythmiaCommunities and Crime Data Set http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+UnnormalizedCensus Income Data Set http://archive.ics.uci.edu/ml/datasets/Census+Income 11 Nilitis, LLC. © 2012