Big Data Day LA 2015 - Feature Engineering by Brian Kursar of ToyotaData Con LA
Producing highly accurate Predictive Models in Social Data Mining can be a challenge. Feature Engineering using traditional methodologies can only take you so far. Trying to find that needle in a haystack when the subject matter is too domain specific or prone to ambiguity can require large investments to achieve accurate results. Through this presentation we will discuss methodologies used by Toyota’s Research and Development Data Science Team and share secrets of building highly accurate Predictive Models for Social data using innovative techniques for Feature Engineering applied on the Apache Spark and MLlib platform.
Slides of my presentations at PyData NYC. This PDF is extracted from a Jupyter RISE slideset available at http://nbviewer.ipython.org/format/slides/github/lechatpito/PyDataNYC2015/blob/master/Word%20embeddings%20as%20a%20service%20-%20PyData%20NYC%202015%20%20.ipynb#/
[DevDay 2016] The toolkit for an amazing product - Speaker: Sebastian Sussman...DevDay.org
We all focus on the code while working with software. Everyday we produce a lot of lines, but what is necessary to build an amazing product? Is developing and completing the requirements enough? How can we deliver the product on time? How do we build a productive and motivated team?
This session will provide some tools that can help the development team to build an amazing and successful product and to keep up with the deadline.
———
Speaker: Sebastian Sussman – CIO at Axon Active Vietnam
Big Data Day LA 2015 - Feature Engineering by Brian Kursar of ToyotaData Con LA
Producing highly accurate Predictive Models in Social Data Mining can be a challenge. Feature Engineering using traditional methodologies can only take you so far. Trying to find that needle in a haystack when the subject matter is too domain specific or prone to ambiguity can require large investments to achieve accurate results. Through this presentation we will discuss methodologies used by Toyota’s Research and Development Data Science Team and share secrets of building highly accurate Predictive Models for Social data using innovative techniques for Feature Engineering applied on the Apache Spark and MLlib platform.
Slides of my presentations at PyData NYC. This PDF is extracted from a Jupyter RISE slideset available at http://nbviewer.ipython.org/format/slides/github/lechatpito/PyDataNYC2015/blob/master/Word%20embeddings%20as%20a%20service%20-%20PyData%20NYC%202015%20%20.ipynb#/
[DevDay 2016] The toolkit for an amazing product - Speaker: Sebastian Sussman...DevDay.org
We all focus on the code while working with software. Everyday we produce a lot of lines, but what is necessary to build an amazing product? Is developing and completing the requirements enough? How can we deliver the product on time? How do we build a productive and motivated team?
This session will provide some tools that can help the development team to build an amazing and successful product and to keep up with the deadline.
———
Speaker: Sebastian Sussman – CIO at Axon Active Vietnam
2. What makes a cow a cow?
Google knows
How do you know? because other people know
We think we know
“because it has four legs”
But the fact of the matter:
not all cows show four legs
nor are they brown …
not all…
3. What is the object in the middle?
No segmentation …
Not even the pixel values of the object …
21. Tag relevance by social annotation
Consistency in tagging between users on similar images.
Xirong Li, TMM 2009
22. Tag relevance by social annotation
Pretty good for snow not so good for rainbow.
23. Social negative bootstrapping
Negative images are as important as positive images to learn.
Not just random negative images, but close ones.
• We want to learn positive
example from an expert,
and obtain as many
negative samples as we
like for free from the web.
• We iteratively aim for the
hardest negatives.
Xirong Li ACM MM 2009
26. acknowledgement
WordNet friends
Christiane Fellbaum
Dan Osherson Kai Li Alex Berg Columbia
Princeton Princeton
Jia Deng Hao Su
Princeton/Stanford Stanford
27. PASCAL VOC
The PASCAL Visual Object Classes (VOC).
500,000 Images downloaded from flickr.
Queries like “car”, “vehicle”, “street”, “downtown”.
10,000 objects, 25,000 labels.
Mark Everingham, Luc Van Gool, Chris Williams, John Winn,
Andrew Zisserman
28. 7. Conclusion
Data is king.
The data are beginning to reflect the human cognition
capacity [at a basic level].
Harvesting social data requires advanced computer
vision control.
37. Object localization 2008-2010
60
50
Max AP (%)
40
2008
30 2009
2010
20
10
0
tvmonitor
pottedplant
bottle
motorbike
diningtable
horse
sofa
train
person
sheep
aeroplane
bicycle
cow
cat
boat
bus
dog
bird
car
chair
Results on 2008 data improve for 2010 methods for all
categories, by over 100% for some categories.
39. Concept detection
Aircraft
Beach
Mountain
People marching
Police/Security
Flower
40. Measuring performance
Set of relevant Set of retrieved
Results items items
1.
2.
• Precision Set of relevant
3.
retrieved items
4.
inverse relationship
Recall
5.
42. Performance doubled in just 3 years
• 36 concept detectors
Even when
using training
data of different
origin, great
progress.
But the number
of concepts is
still limited.
Snoek & Smeulders, IEEE Computer 2010
43. 8. Conclusion
Impressive results and quickly improving per year.
Very valuable competition.
Best non-classes start to make sense!
45. SURF based on integral images
Introduced by Viola & Jones in the context of face
detection: sliding windows in left to right / up to bottom
integral images.
46
50. Real-time bag of words
Descriptor Projection Classification
Extraction
Pre-projection Actual projection SVM kernel
D-SURF Random MAP:
<empty> RBF
2x2 Forest 0.370
15 10 13
Total computation time is 38 milliseconds per image
26 frames per second on a normal PC in any 20 concepts.
51. 9. Conclusion
SURF scale and rotation invariant
Fast due to the use of integral images
Download: http://www.vision.ee.ethz.ch/~surf/
DURF extraction is 6x faster than Dense-SIFT.
Projection using Random Forest 50x faster than NN.
52. Internet Video Search: the beginning
telling
stories
measuring concept lexicon
video features detection learning
browsing
video
video