Successfully reported this slideshow.
Your SlideShare is downloading. ×

Learn about Your Location (Using ALL Your Data)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 38 Ad
Advertisement

More Related Content

Similar to Learn about Your Location (Using ALL Your Data) (20)

Advertisement

Recently uploaded (20)

Learn about Your Location (Using ALL Your Data)

  1. 1. Taryn Price and Courtney Shindeldecker April 13, 2017 Learn About Your Location (Using All Your Data) Machine Learning in Data Fusion and Analysis
  2. 2. • Data fusion example - Chicago data • Wikipedia, imagery, crime events, Open Street Maps • Embedding Space • What is it? What can we do with it? • Three types of embedding space useage • Similarity search • Classification • Clustering Outline
  3. 3. • A large variety of data sources are available. • Open Data Portal • Wikipedia • Open Street Maps • Satellite imagery • A large variety of data types are available. • Shapefiles (subway lines, parks, building footprints) • “Flat files” (crime events, lobbyist registration, building permits) • Text (descriptions of locations and/or events) • Images (satellite) Introducing: Chicago (Why Chicago?)
  4. 4. Triples to embedding space
  5. 5. • 1652 Chicago-area articles • Convert articles to document embeddings • Find nearest neighbors (6) <wiki01> <similarText> <wiki04> Chicago Wikipedia
  6. 6. The Eisenhower Public Library District is a public library located in Harwood Heights, Illinois, one of two suburbs completely surrounded by but not incorporated into Chicago. Chicago Wikipedia - similarity example The Pritzker Military Museum & Library (formerly Pritzker Military Library) is a museum and a research library for the study of military history in Chicago, Illinois, US.
  7. 7. Chicago imagery • Subset crime data • 5000 “events” • Download imagery • Zoom level 18 • Create chips out of image(s) • Select chips that contain a crime event from the subset. (4,760 chips)
  8. 8. • Extract image-chip features • VGG-19 CNN classifier • Use last layer before the softmax as chip features • Apply L2 norm • Find nearest neighbors (10) <chip01> <similarImgTo> <chip07> Chicago imagery
  9. 9. Chicago imagery - similarity example
  10. 10. • Download csv of events • Jun 1, 2016 - Mar 27, 2017 • ~300,000 crime events • Select potentially useful field: “Primary Type” • Ex. Arson, Narcotics, Theft, Robbery Now what…? Chicago crime events
  11. 11. Open Street Map (OSM)
  12. 12. • ~900,000 polygons • Attributes • Amenity (restaurant, cafe, parking, fast food, etc.) • Building (residential, house, school, church, etc.) Again, now what…? Open Street Map
  13. 13. • Create covering polygons to “roll up” data • Divide bounding polygon using OSM “points” • Dense areas contain smaller voronoi polygons • Mean area ~2 kmsq Chicago voronois
  14. 14. • Point / polygon data • <Vor01> <hasCrimeType> <robbery> • <Vor01> <hasOSMAmenity> <grave yard> • <Vor01> <hasOSMBuilding> <school> • Wikipedia • <Vor01> <hasWikiSite> <Wiki100> • Imagery • <Vor01> <hasImage> <Img05> Chicago voronois - tie data together
  15. 15. Sample of resulting graph Vor 1 Vor 2 Assault Arson church Wiki 100 Wiki 54 office Img 1 Img 5
  16. 16. … Create embeddings from triples … (more in a minute)
  17. 17. t-SNE of resulting embeddings
  18. 18. Example use: dentist office locations
  19. 19. Exploring Fused Embedding Space with Machine Learning
  20. 20. • We create embedding vectors (or more simply, embeddings) to represent entities, relationships, groups and concepts in the graph • Embeddings are low-dimensional, dense, real-valued vectors • They can be generated in an unsupervised fashion (no labeling or annotation required) • Powerful, general-purpose representations of entities and relationships Representing graphs with vectors 3-dimensional embedding vector [2, 4, 5] 10-dimensional embedding vector: [3, 1, -3, 4, 8, 2, -7, 21, 7,
  21. 21. Translational models for embedding graphs Learn how to place entities in vector space Inferencing using vector arithmetic Common Technique: Trans-E
  22. 22. As the model trains, embedding locations are learned and updated over time. Eventually, the model converges and we can explore the resulting embedding space. Learning Embedding Locations
  23. 23. Low-dimensional embedding space can be visualized and investigated. (This task becomes difficult above three dimensions) Embeddings Space
  24. 24. Machine learning leverages embeddings Supervised and unsupervised techniques over embeddings allow us to ask probabilistic questions: How much alike are these entities? What is the probability that relationship r holds between two entities? Classification ClusteringSimilarity search
  25. 25. Given an entity of interest, what other entities in my data are most similar? Multiple approaches, but with learned embedding space, we can answer this [almost] for free Example: using vessel metadata + AIS track data, find vessels that are similar to one another Similarity Search Norwegian Jewel
  26. 26. Given an entity of interest, what other entities in my data are most similar? Multiple approaches, but with learned embedding space, we can answer this [almost] for free Example: using vessel metadata + AIS track data, find vessels that are similar to one another Similarity Search Norwegian Jewel Most Similar Vessel: Carnival Elation
  27. 27. 01 02 03 Shows all vessels Blue Heat Map 30 most similar vessels to the Norwegian Jewel Orange Heat Map Activity pattern of this vessel group quite distinct from the dominant blue pattern Activity Pattern Vessel similarity
  28. 28. Given a training set of entities and their labels, classify all other [unlabeled] entities in the data Common techniques: • Random Forest • k-Nearest Neighbors • SVM Example: Hurricane Katrina relief in New Orleans: Identify potential food distribution centers across the entire city Classification Random Forest Classifier
  29. 29. Voronoi discretization of the physical space Open Street Maps Overhead imagery
  30. 30. Voronoi discretization of the physical space Open Street Maps Overhead imagery Voronoi covering of the region informed by both data sources
  31. 31. Create training data Green is labeled as a good site for distribution center Use these labeled examples to train classifier Superdome New Orleans Saints Training Facility
  32. 32. Use the trained classifier on all voronois Red voronoi embeddings were classified by the model as potential distributio center sites
  33. 33. Discover matching locations not investigated manually Areas with large buildings and proximity to highways were identified
  34. 34. Applications in other domains Retail: opening a new locationAgriculture: finding land optimal to your crop Ecology: identifying regions with or in danger of erosion
  35. 35. Unsupervised Learning: Clustering Goal: Is there structure in my data? Technique: k-means ● Partition all embeddings into k groups ● Each embedding belongs to the cluster of nearest mean (prototype embedding of the cluster) Clustering results are not based on any specific labeling of each cell
  36. 36. Technology: Building a Graph of Data Unsupervised characterization of locations By clustering locations based on their embeddings, we can generate informative partitions of an area
  37. 37. Technology: Building a Graph of Data Unsupervised characterization of locations By clustering locations based on their embeddings, we can generate informative partitions of an area Sea lanes Offshore platforms
  38. 38. Taryn Price: tprice@ccri.com Courtney Shindeldecker: cshindeldecker@ccri.com We’re hiring! Check out www.ccri.com for more info, blog, and sweet videos www.ccri.com Twitter: @ccr_inc Look us up at CCRi

Editor's Notes

  • The numbers do not correspond to anything in the data; just a location in embedding space
  • Translational embedding models learn how to place entities in vector space such that the vectors pointing from one entity embedding to another represent the predicates joining those entities in the graph.
    This property allows us to perform inference using vector arithmetic – Given two elements of a triple, we can infer the thirds.
    The most commonly implemented translational model is known as Trans-E
  • Identified subgroup of vessels displaying different behavior from all other vessels
    Identified most or all major cruise line / large ship routes by querying about only 1 representative vessel
  • What are we doing here? Making a group of good examples and knn on the group?
  • Approach: Train classifier on examples. Then classify all voronois in AO and return highest-scoring (most similar) voronois as potential locations
  • Green = training examples
    Red = predictive surface: voronois classified as good potential distribution centers
  • Gives general idea of where to look
  • Other applications: retail looking to expand to new city, region, etc… where should I open a new location?
    Agg: growing grapes: if you can get elevation/slope data, temp, humidity, sunshine, soil acidity, etc… this could do that.
    Agg: you want to grow a crop of a particular type and you know what conditions it likes, this can help you find the land you’d want to lease
    Ecology: erosion identification and prediction
    Key: this is not rule-based, you don’t have to tell it what you want. It learns about everything in your space and also about what you like, and based on what you tell it that you like, it infers those identifying features about positive examples and finds new ones *in places you may not have seen and for reasons you may not have thought about*. Could potentially uncover factors you didn’t know about that are contributing to your success
  • Data: OSM + Imagery (voronois were made using just OSM)
  • Uncover non-apparent relationships (pink voronois)

×