Transforming instagram data into location intelligence


Published on

Published in: Data & Analytics, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Transforming instagram data into location intelligence

  1. 1. Data Science Innovation: Transforming Instagram Data Into Location Intelligence and Internet of Things April 2014 or
  2. 2. Topic Areas 1. Statistics/Data mining or Data Science? 2. Data Science workflows/discovery 3. Research informing our thinking about location intelligence 4. Data Science innovation and exploratory analysis 5. Motivations for Instagram project 6. Pattern mining trajectories/Data mining 7. Instagram analytics tools 8. NoSQL- MongoDB 9. Datafication 3 back end (walk thru) 10. Location Social Recommender system 11. Q&A
  3. 3. Statistics, Data Mining or Data Science ? • Statistics – precise deterministic causal analysis over precisely collected data • Data Mining – deterministic causal analysis over re-purposed data carefully sampled • Data Science – trending/correlation analysis over existing data using bulk of population i.e. big data Adapted from: NIST Big Data taxonomy draft report (see /show_InputDoc.php)
  4. 4. Data Science Workflows & Discovery
  5. 5. Useful References Informing our Thinking about Location Intelligence (Silva et al (2013) A comparison of Foursquare and Instagram to the study of city dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing Instagram and Foursquare datasets might be compatible in finding popular regions of city Chaoming Song, et al. (2010), Limits of Predictability in Human Mobility, Science There is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population. Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for Pervasive Systems. Proceedings of the 9th International Conference on Pervasive Computing (Pervasive'11) Daily and weekly routines => Few significant places every day => Regularity in human activities => Regularity leads to predictability
  6. 6. Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge Workshop. we have shown that it is possible to exploit the correlation between movement data and social interactions in order to improve the accuracy of forecasting of the future geographic position of a user. In particular, mobility correlation, measured by means of mutual information, and the presence of social ties can be used to improve movement forecasting by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of potential existence of physical or distant social interactions and vice versa. Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human Mobility Where are you going to be 285 days from now at 2pm …we show that it is possible to predict location of a wide variety of hundreds of subjects even years into the future and with high accuracy. Useful References Informing our Thinking about Location Intelligence
  7. 7. “One of the most fascinating aspects of location-based data is the stability and predictability of patterns that can be mined from seemingly unrelated data. A cluster of random dots on a map can represent a daily transportation route, the most popular dating spots or the neighborhoods with the highest concentration of gang violence. These patterns, analyzed over time and in large numbers, begin to allow for informed predictions of behaviors and events. For government, this analytical capability enables better resource allocation and more effective outcomes”. Interview with G. Edward DeSeve, former White House ARRA chief administrator, December 15, 2011. Seen in “The power of zoom: Transforming government through location intelligence” by Deloitte Consulting LLP Source: UnitedStates/Local%20Assets/Documents/Federal/us_fed_govlab_power_of_zo om_report_100212.pdf Useful References Informing our Thinking about Location Intelligence
  8. 8. Useful NSW Govt resources on Location Intelligence • NSW Globe – – Uses Google Earth to explore spatial data and images • NSW Location Intelligence Strategy (April 2014) – NSW Location Intelliegence Strategy.pdf • NSW Government datasets –
  9. 9. Data Science Innovation Data Science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
  10. 10. The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth
  11. 11. Discovery (Exploratory) Analytics  Exploratory – Unstructured – Machine learning – Data mining – Complex analysis – Data diversity  Richness of new sources X Business Intelligence – Dashboard – Real time decisioning – Alerts – Fresh data – Response time  Speed of Query
  12. 12. Data Science Innovation New sources of information for data driven applications and Internet of Things Number of journeys made Distances travelled Types of roads used Speed Time of travel Levels of acceleration and braking Any accidents which may occur The Industrial Ecology Lab - towards an integrated Australian research platform
  13. 13. Black Box Insurance • Telematics technology (black box) helps assess the driving behavior and deliver true driver centric premiums by capturing: – Number of journeys – Distances travelled – Types of roads – Speed – Time of travel – Acceleration and braking – Any accidents • Benefits low mileage, smooth and safe drivers • Privacy vs. Saving monies on insurance (Canada) –
  14. 14. Internet of Things “trillion sensors” Source:
  15. 15. Smartphone, Google Glass or Apple Watchwill Know What you Want before you do “…from 2014 your phone [glasses or watch] will anticipate your needs, do the research, tell you what what you want to know – sometimes before the question even occurs to you…” Chapman, Jake (2013), The Wired World in 2014
  16. 16. Push Notification Providers 1. Appboy 2. Urban Airship 3. StackMob 4. Parse 5. 6. 7. 8. 9. 10. 11.mBlox 12. 13. 14. 15.Kahuna -
  17. 17. Mobile Relationship Management Workflow (Urban Airship) What/When?/Where?
  18. 18. Apple Passbook Styles Urban Airship
  19. 19. Motivations for Instagram Project • Trajectory data (not i.i.d. – independent and identically distributed) • A new authentication approach based on trajectory • Predictive capability phones, glasses and watches • Internet of Things (Sensors, RFID, Wheelchairs and Drones) • Indoor GPS • Car parking “anywhere” • Location based services e.g. advertising • Tourist recommender system • Food analytics and traceability (farm fork) • Mobile apps with trajectory data e.g. Foursquare, Instagram, Nike+ EveryTrial • Insurance “pay as you drive”– telematics black box based insurance policy
  20. 20. Pattern Mining Trajectories Group of Trajectories Trajectory Patterns: 1. Hot regions (basic unit) 2. Trajectory pattern is relationships amongst regions Opportunities : Location based networks Destination prediction Car-pooling Personal route planning Group buying Loyalty Credit card data Adapted from: Chang, Wei, Yeh and Peng, “Discovering Personalised Routes from Trajectories” ACM, LBSN’11, Chicago,illinois,USA, 1 November 2011
  21. 21. Open Source Artifact Highlighting 68 Data Mining Algorithms
  22. 22. First Australian Instagram Study Conducted by UTS:AAI
  23. 23. Why is Instagram Popular ? • Mobile photo sharing app + social network • Mobile first Workflow: – take picture or select => crop/filter => geo-tag/hashtag/description/share • Instagram is “Twitter but with photo updates” • Status updates are transformed photos • Default is pictures and accounts are public • Pictures include: – Geolocation, hashtags, comments and likes • Mobile app friendly vs. desktop
  24. 24. Instagram Analytics Tools (off the shelf) • Statigram – Lifetime likes – Total comments – New followers/last 7 days – Most liked photos • Simply Measured – Total engagement Instagram, Facebook and Twitter – Engaging photo/filter/location – Top photos by date – Active commenters – Best time for engagement – Best day for engagement – Top filters • Nitrogram – Countries of followers – Most engaging – Most commented – Likes and comments on a photo
  25. 25. MongoDB - An Innovation in Databases? “MongoDB gets the job done” “document-oriented NoSQL database” “MongoDB is natural choice when dealing with JSON” “Same data model in code = same model in database” “Data structure store to model applications” “In MongoDB Instagram post can be stored in single collection and stored exactly as represented in the program as one object. In a relational database an Instagram post would occupy multiple tables.” “MongoDB understands geo-spatial co-ordinates and supports geo-spatial indexing” “Initial MongoDB prototype RedHat OpenShift (Public/Private or Community “Platform as a Service”) Recommendation engine integrating Mahout libraries and MongoDB (see Roadmap) As discussed @ Journey to MongoDB:Trajectory Pattern Mining in Australian Instagram By Suresh Sood and Xinhua Zhu **Sydney MongoDB Meetup 30 April 2013
  26. 26. JSON Sources Driving Internet of Things • RaZberry – • Teradata – • Google –
  27. 27. • Rich query language • Native secondary indexes • Geospatial indexes & search • Text indexes & search • Aggregation framework (see Mongo doc for Release 2.4.9) • Map-Reduce (Javascript ) implementation • Client-side analytics MongoDB Analytics Support of Instagram Project
  28. 28. Architectural Implementation using MongoDB Name Node Mongo Database distributed across shards Data Collection Data Collection Stats Stats Map Reduce Instagram via API
  29. 29. Client for Instagram project
  30. 30. Timeline based Trajectory Analysis
  31. 31. Google Map based Trajectory Analysis
  32. 32. Social Relationship Analysis
  33. 33. Location based Retrieval
  34. 34. Popular HashTag Analysis
  35. 35. Popular Image Analysis
  36. 36. Peak Usage Time Analysis
  37. 37. Active User Analysis
  38. 38. Roadmap Data collection Individual(Group) Analysis Find Preference and Behavior pattern(including Trajectory pattern) Recommendation Recommend right product (or service) to right person ( or group) at right time and place Manually Automatically
  39. 39. MongoDB Mahout or Mortar Recommender Recommended Trajectories • Trajectories • Points of Interest • User profiles • Image details • Recommender engine (Mahout or Mortar) Algorithms MongoDB Connector for Hadoop Version 1.2.0
  40. 40. Supporting Documentation • Instagram project documentation – Data Model and Data Collection Procedure (V2.0) • MongoDB Aggregation and Data Processing Release 2.4.9