Slideshow transcript
Slide 1: Sara Wood www.swivel.com
Slide 2: vicious data cycle Apathetic People The World’s Indifferent Data Leaders
Slide 3: two types of data in the world public and private Citizens Shareholders & Directors Data from Sales, Statistical Politicians Operations, Finance, CEOs Offices Products, etc. public private
Slide 4: data today
Slide 5: problem 1 • no eective way to know what datasets exist in the public domain on a particular topic • the best data on temperature worldwide for the past century?
Slide 6: problem 2 • even if one can identify which data exist on a topic; a dierent standard is often used for reporting, dierent data architecture and dierent ways to describe data set documentation
Slide 7: problem 3 • some data that make it to the public domain are only in the public domain temporarily. • the organization that collected the data may not support the data archive except for a short period of time.
Slide 8: problem 4 • enormous volumes of data are only available in inaccessible formats - – hardcopy from government annual reports or statistical digests that have never been digitized – custom built web applications which renders data in set formats or structure
Slide 9: problem 5 • many datasets are collected but never put in the public domain. • the reasons include; extra costs and eort of putting the data in a public archive, providing documentation and the tendency for some groups to see datasets as their personal property.
Slide 10: technology today
Slide 11: how are computers made intuitive? old answer new answer wizard surfing
Slide 12: wizard SAS Enterprise Guide http://sas.com
Slide 13: surfing gain insights, share them
Slide 14: what do you do with your computer? old answer new answer run software meet people collaborate
Slide 15: run software BusinessObjects XI http://businessobjects.com
Slide 16: meet people and collaborate
Slide 17: where do you discuss ideas? old answer new answer PowerPoint Internet meeting
Slide 18: PowerPoint meeting Microsoft PowerPoint http://microsoft.com
Slide 19: Internet
Slide 20: when do you get useful features? old answer new answer software web speed release cycle
Slide 21: software release cycle
Slide 22: web speed swivel geography
Slide 23: web site solutions old answer new answer we’ll build it value comes ourselves from community
Slide 25: Visualization by Matt Hurst
Slide 27: swivel’s mission make data useful so people share insights, make great decisions and improve life.
Slide 28: what is swivel? web2.0 for data. swivel makes it easy for people to understand, use and collaborate on data so they can share insights and make great decisions.
Slide 29: virtuous data cycle Engaged People The World’s Accountable Data Leaders
Slide 30: what’s in it for int’l agencies? (a few ideas) • meet mandate of broader dissemination: new & dierent audience • legal framework for encouraging creativity • metrics on your data and how it is used • develop community, dialogue and feedback • better and more creative data stewardship • data cleaning, standardization, visualizations and more
Slide 31: demo
Slide 32: company
Slide 33: two types of data in the world public and private Citizens Shareholders & Directors Data from Sales, Statistical Politicians Operations, Finance, CEOs Offices Products, etc. public private
Slide 34: business model compare your private data with the world’s public data public private
Slide 35: business model compare your private data with the world’s public data swivel public all the data exploration and visualizations at swivel are free to everyone and licensed under a creative commons attribution license swivel private for a monthly subscription fee businesses can explore and share their data while keeping it private and secure
Slide 36: company dna data + web + enterprise software swivel management team Brian Mulloy, Co-founder & CEO Dmitry Dimov, Co-founder & Product Chief Sara Wood, Chief Data Officer & VP exclusive investor & primary advisor Minor Ventures Halsey Minor, Founder of CNET, news.com, etc.
Slide 38: joe hellerstein sfbay data mining sig, 080807
Slide 39: connections • related research at berkeley • control: interactive data analysis – online aggregation and mining – potter’s wheel data xformation & cleaning
Slide 41: connections • related research at berkeley • control: interactive data analysis – online aggregation and mining – potter’s wheel data xformation & cleaning • federated facts and figures (FFF 2000)
Slide 43: agenda • swivel: technology and desire • many data mining challenges • reaching out to researchers
Slide 44: 3 stages liberating data (upload/import) exploiting aggregation leveraging community technical challenges in each
Slide 45: liberating user data: structure
Slide 46: liberating user data: structure • challenges – a simple structural algebra • accomodates non-relational data • not a turing-complete nightmare – visually intuitive • affordances encouraging (recognizing) “good” formats • transparency of cause and effect – api-friendly – role of automation?
Slide 47: liberating user data 2: content
Slide 48: content challenges • data formatting – structure at the cell level • data cleaning – canonicalization, record linkage (reference reconciliation, object ID, merge/purge) – outlier detection • old (hard) problems and new twists
Slide 49: types: a piece of the puzzle • column type induction • community code books
Slide 50: mdl type induction • best type = best compression dl (coltype ) = matches \" log(| type |) + 8! mismatchi .len i – balances against overfitting – works for opaque types • challenges – non-categorical types – composite types – lots and lots of types
Slide 51: 3 stages liberating data (upload/import) exploiting aggregation leveraging community technical challenges in each
Slide 52: exploiting aggregation: graphscape
Slide 53: swivel basics • graphs are not created, they exist – have intrinsic identity – easily shared – declarative: malleable/composable • naturally knits graphs into the web – independent of image formats, etc. – this will be key • highlights mining opportunities
Slide 54: a simple graphscape • features of an excel graph? – data (points and labels) – visual semantics • coordinate space • marks • connectivity of marks • relationships between multiple series • legends – decorative attributes (“bling”) • proximity in this feature space? photo: My Hobo Soul (flickr)
Slide 55: graphscape & transformation • given a transformation algebra – structural transforms – relational operators • inherently spans multiple “data sets” – this is good, we need to go there • neighborhood function?
Slide 56: graphscape: what for? • navigation (including creation) • search • mashup • data cleaning • schema mining • trend analysis, prediction • etc. photo: rohit (flickr)
Slide 57: 3 stages liberating data (upload/import) exploiting aggregation leveraging community technical challenges in each
Slide 58: graphscape: now add community • tags • comments & shout-outs • anchor text (blog entries) • social network • searches (data & bling) • mashups • data analysis traces. • etc.
Slide 59: community is multifaceted • collaborative visual data analysis – asynchronous presentation and annotation – will get richer as visual tools progress • just think about learning from stage 1 • now apply to all other interactions on graphs – creation/bling, navigation, transformation, etc. • collaborative data curation – cleaning, linking, highlighting – analogies to von ahn’s games • data-cleaning captchas?
Slide 60: community opportunity & challenge • could crack some big open problems – optimism in the data warehousing space • but many challenges arise at scale – noisy user input (errors, spam) – redundancy and inconsistency in data – multifaceted use – data dynamics – …
Slide 61: engaging technologists • new kind of corpus – but not just swivel: spreadsheet silos in lots of organizations – challenge problems (kdd cup?) • swivel as a platform for data mining folk – how do technologists leverage corpus, userbase, ? – functionality of interest
Slide 62: joe hellerstein



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 1 (more)