Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Interpreting and Teaching with Big Data

  • 570 views
Uploaded on

There is a flood of information online from tweets,feeds, status updates, photos, government, private, and other …

There is a flood of information online from tweets,feeds, status updates, photos, government, private, and other
sources. Just how big is “big data”? This presentation will share examples of big and open data in the cloud:where it
comes from, how it’s stored, and what you can do with it. Learn to incorporate real world data online for your
students to analyze using Excel; create data visualizations and infographics, and understand the impact of Data
as a Service as a model for cloud computing.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
570
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
17
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • 6 Degrees of Kevin Bacon, Name is Dumb Luck6 Degrees of Separation – within networks of people or things, there is a theoretical maximum of 6 points between any two nodesThat’s the Bacon IndexBob is 1, Ann is 2, Joe is 3. Index can only get so big because of interconnections.If Kim is connected to Bob, Kim is 2, not 4.
  • Twitter can’t be structured. Twitter is a bunch of words that humans are the best at parsingAnd so again we’re back to the 3 V’s, Volume, Velocity, and Variety. Not only is twitter’s data disorganized, it handles over 3000 new tweets per secondTwitter is using this data to recommend things to you, and it does it all lightning fast through an engine called Storm
  • If Amazon can see that lots of people buy forks and knives together, or that people buy curtains and curtain rods together how do they not recommend everyone who has bought a wrench set or a copy of black beauty buy them together if someone else has?This is where things get complicated
  • Twitter isn’t the only place where unstructured, realtime data is being processed. Facial recognition is a massive big data problemYour iPhone does facial recognition. Facebook does facial recognition. Aperture learns about faces from hundreds of data points and can help you find who is in what photos. Amazing.How do we do this so quickly?
  • Should it be opt-in only? http://www.code.org/sites/all/themes/codedotorg/logo.png
  • - Hereis a blood pressure monitor fromiHealththat stores yourblood pressure data in the cloud.
  • Here’s an appthat monitors yourheart rate fromyourphone’s camera, amazingstuffSo all thiswellness data isnowbeingcollectedubiquitously. How canitbeusedsecurely and effectively to make all of us healthier? This is the big data problem in health care

Transcript

  • 1. Drinking from the Fire Hose:Tools for Interpreting and Teaching with Big Data Mark Frydenberg Bentley University
  • 2. CourseMate Enhanced Edition
  • 3. 77 Movies and TV Shows!
  • 4. Whats your Bacon Index? 2 Ann Joe 3 Bob 2 X 4 1 Kim Kevin
  • 5. APIs
  • 6. Friend of a Friend
  • 7. Social Graph
  • 8. Big DataBig data refers to acollection of tools,techniques andtechnologies which makeit easy to work with dataat any scale. powerof60.com
  • 9. The Road
  • 10. 3 Vs• Volume - amount of data is larger than those conventional relational database infrastructures can handle• Velocity - the rate at which data is generated, processed and analyzed in (real) time• Variety – data formats are unstructured and inconsistent
  • 11. Volume: How Big is Big Data?
  • 12. Yottabyte?
  • 13. Walmart• Walmart collects more than 2.5 petabytes of data every hour from its customer transactions.• A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text. http://hbr.org/2012/10/big-data-the-management-revolution/ar
  • 14. Velocity: Drinking from the Firehose• Scrutinize 5 million trade events created each day to identify potential fraud• Analyze 500 million daily call detail records in real-time to predict customer churn faster
  • 15. A Variety of Big Data Sources
  • 16. McKinsey&Company Report (2011)• Data is part of every industry and business function.• Data creates value.• Big data becomes a basis of competition and growth.• Some sectors will achieve greater gains.• Shortage of people with analytical skills.• Need policies related to privacy, security, ownership.
  • 17. Twitter
  • 18. Twitter3000 tweets per seconddata is disorganizedHow does twitter use its data?
  • 19. Twitter Visualization
  • 20. Big Data Technologies• HADOOP: scalable storage, parallel computation• NoSQL: distributed querying
  • 21. What this Means• Change your web page and Google finds it in minutes.• Ten years ago, you would have to submit a request to Yahoo! to reindex your site.• All you need is a lot of servers.• Google has a million of them.• No problem.
  • 22. http://aws.amazon.com/big-data/
  • 23. Collaborative Filtering
  • 24. Collaborative FilteringThe Black Black Camera Tripod Stallion Beauty Me You
  • 25. Variety: Semantic Web
  • 26. RelFinder
  • 27. Unstructured Data
  • 28. Health Care
  • 29. Analyzing Big Data
  • 30. explore.data.gov
  • 31. Searching Big Data
  • 32. Fusion Table Visualizations
  • 33. Fusion Table Visualizations
  • 34. Fusion Table Visualizations
  • 35. Mark Frydenberg mfrydenberg@bentley.edu cis.bentley.edu/mfrydenbergCourseMate Enhanced Edition Invite me to your school!