Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

Login or Signup to add a comment!


Showing 1-50 of 0 (more)

Query Directed Data Mining

From dcancel, 6 months ago

Techniques using Python & Parallel Processing

210 views  |  0 comments  |  0 favorites  |  7 downloads  |  2 embeds (Stats)
 

Categories

Add Category
 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 210
on Slideshare: 188
from embeds: 22

Slideshow transcript

Slide 1: Query-Directed Data Mining Techniques using Python & Parallel Processing Christopher Gillett Chief Software Architect Compete, Inc. Copyright © 2000-2005 Compete™ All Rights Reserved. 1

Slide 2:Background Compete, Inc. analyzes large amounts of data (gigabytes per day), and accrues terabytes of data every year supporting its predictive analysis business • Tera-scale storage requirements • Massive data processing needs • Problem: Ad-hoc research against these large data sources Technology platform is Unix clump/cluster/grid of 60+ machines • Job level parallelism managed using Portable Batch System • Storage is NFS with dedicated NFS servers • Storage managed by Compete File System (CFS – presented earlier at PyCon 2005). • Application code written in a variety of languages: C, C++, Java, Python, etc.