Big dataweb, science, mining

2,997 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,997
On SlideShare
0
From Embeds
0
Number of Embeds
407
Actions
Shares
0
Downloads
38
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Welcome thank for invite, background, assumed read profile First talk, as an entrepreneur through n in at the deepend always good, make sure you learn to swim fast.
  • 10min set the science 10min what is data science and review of characters in the industry, what saying whats being leartn, OPEN source 20 hands on code. 10 min Q&A
  • Started with a dot, physists tells you a big bang! Data story began. .com commercial, transaction focus, e-commerce automation, mechanical Burst – thinking continueium – web2.0
  • Send it to friends, family, share openess story build application Infractucture – econonmics Pace of data – bandwidth FB Zuckerberg open share ration growing faster than more law. What happens as we cycle through this and speeds up – DATA web web2.0 squared web3.0 . . . .
  • Open and share accellerate (privacy debate – wont go there) How is could difference from moore law, that plus more – hadoop, more to go in the cloud, don t want per hour, want what I need, NOSQL, data portability etc. Data science- what does it all mean?
  • Re cap and make conclusions
  • Live state of physics 1800 Chairman Google Community rallying around Data Science, strataconf. Structure, local meetups How does data live? Characters in the industry, I ve been reading about, useful to link to post get started.
  • Add three hats graphics yellow hard hat, prof hat and marketing hat! Dave mccure!
  • Data Flow Clean keep up to date include new? (big problem? If data with answer is not included, doesn 't matter how smart you DM is !) Algorithm – magic Present -communicate, API portable, feedback loop, etc
  • Range of business, infrastructure hadoop cloudera, business linkedin, amazon e-commerce, health everything LL me Link into data mining,
  • Infrastructure stack
  • Cross source view of world
  • Amazon and ebay talk tomorrow keynote
  • Yahoo meetup James Sarwoski Wisdom of the Crowd book, prediction markets, choice bet with money better, what if replace bet with money with bet with your life? Need to measure life? Set hypthosis – test. Need curiosity to apply ideas Smart on our own – smarter networked? Only live life in real time Lots of 'path' already worn
  • Next push of the web? Start up to existing need skill set, education market adopting to skill up work place Picture of a cat, = curiosity
  • Picuture small med large show different level of granularity of data What hypothsisi are you trying to ask? Lets go and see what each is usfeul for?
  • Show live site stats Need to get screen shot
  • Got chrome or FF Code open files Story show class of data lifecycle, clean, make wise, UI API RDF Example, choices made, two words limit 50 FREQUENCY PLAYING GOT image assumption try and crowd source everything, getting start, re start once started Use Couch DB to show top50 May change two words or limit to 100? Trade off with speed We know what the answer will look like? Just getting there. Not always awere choice made, frequency of matching, weights attached 'Rule' be consistent Could be better but is quantums better than what we have Learn by doing ie learn be accident! 'play god slide'
  • Dave winer not so much data for and against, to be use to make what we need.
  • Speak on conf. On future of language, our job to pursudate in data science ie this direction
  • Big dataweb, science, mining

    1. PHPUK Data Web Data Science Data Mining BIG BIG BIG BIG
    2. Agenda <ul><li>BIG DATA WEB
    3. BIG DATA SCIENCE
    4. BIG DATA MINING
    5. SUMMARY </li></ul>
    6. com
    7. Web2.0 Moores Law Economics Bandwidth SOCIAL open/share
    8.  
    9. Data web Web 2.0 + mobile Cloud Computing Data Science
    10. In practice <ul><li>Existing data
    11. Always working
    12. Every webpage personalized </li></ul>
    13. DataWeb Summary <ul><li>Data - expanding fast rate
    14. Economic free cloud
    15. Personalization real time
    16. Science applied to society </li></ul>
    17. Data Science <ul><li>What is data science?
    18. Data lifecycle
    19. Case studies </li></ul>
    20. What is data science? <ul><li>Combines three areas
    21. Engineering
    22. Mathematics Statistics ML
    23. Communication </li><ul><ul><li>PP to infographic, product, API </li></ul></ul></ul>
    24. Data lifecycle <ul><li>Comes from? </li></ul><ul><li>Data conditioning </li></ul><ul><li>Scale </li></ul><ul><li>Tell a story </li></ul><ul><li>Intelligence </li></ul>
    25. Case Studies <ul><li>Range of perspectives
    26. Cloudera
    27. Bitly
    28. LinkedIN
    29. e-commerce </li></ul>
    30. Cloudera <ul><li>Jeff Hammerbacher
    31. http://jeffhammerbacher.com/
    32. Video
    33. http://www.cloudera.com/?resource=orbitz-ideas-jeff-hammerbacher-evolving-new-analytical-platform-apache-hadoop
    34. Enterprise side – Dataspaces </li></ul>
    35. Bitly <ul><li>Hilary Mason
    36. http://www.hilarymason.com/
    37. Video
    38. http://www.youtube.com/watch?v=KWszSUm-x2Y
    39. Links across lots of services </li></ul>
    40. LinkedIN <ul><li>Monica Rojita
    41. http://www.linkedin.com/in/mrogati
    42. Video
    43. http://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/
    44. Core part of product team </li></ul>
    45. e-commerce <ul><li>Ebay.com keynote Saturday morning
    46. Amazon.com - John Rauser
    47. http://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-what-is-a-data-scientist/
    48. Heart of discovery- probability to purchase </li></ul>
    49. Me OSDS <ul><li>Vision
    50. Wisdom of Crowds
    51. Big made from small </li></ul>
    52. Data Science Summary <ul><li>Go(ing) mainstream
    53. Wide variety applications
    54. Curiosity gives edge </li></ul>
    55. Data Mining <ul><li>Types - techniques
    56. Examples: </li><ul><li>Statistics - Text categorisation - SOM </li></ul><li>Summary </li></ul>
    57. Types - Techniques <ul><li>Granularity </li></ul>WWW Blog Post Specific Sentiment
    58. Statistics <ul><li>Simple is beautiful
    59. Real time maybe best indicator </li></ul>
    60. Text Categorisation <ul><li>Show me the code
    61. Data lifecycle
    62. Assumptions
    63. Scaling </li></ul>
    64.  
    65.  
    66.  
    67. <ul><li>BIG </li></ul>BIG made from small BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG
    68. SOM-Automous learning <ul><li>SOM – SELF ORGANISING MAPS
    69. Dr. Andrew Starkey – Blue Flow Ltd
    70. Aberdeen University Spin Out
    71. http://www.blue-flow.com/ </li></ul>
    72. <ul>Liverpool play in red </ul><ul>Liverpool have a red strip </ul><ul>Liverpool used to play in blue </ul><ul>Liverpool in a red strip </ul><ul>Liverpool known for their red strip </ul><ul>Everton play in blue </ul><ul>Everton have a blue strip </ul><ul>Everton known as the bitter blue </ul><ul>Everton have a horrible blue strip </ul><ul>Everton – don’t like their blue colour </ul>
    73.  
    74.  
    75.  
    76. DM summary <ul><li>No one on its own but combination
    77. Future – more human
    78. Emergence Platform – Cloudspaces PtoP
    79. Personal data (VRM) </li></ul>
    80. PHP - dataweb <ul><li>40% web CMS leading OS
    81. 40% value from data
    82. Evolution language - LINC .net example
    83. https://github.com/dahlia/phunctional </li></ul>
    84. Summary <ul><li>Data Web here
    85. Personalised start to everything
    86. Society science
    87. Life = Information </li></ul>
    88. Thank you <ul><li>Q & A
    89. Feedback https://joind.in/4955
    90. http://lanyrd.com/2012/php-uk-conference/sptkm/
    91. Contact
    92. James Littlejohn
    93. @aboynejames [email_address]
    94. +44 7521580938 </li></ul>

    ×