Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Changing the way people search with apache spark

1,251 views

Published on

Like humans, search engines will have an evolutionary brain that understands search behavior to learn from it. Search engines are fast becoming personal assistant by enabling meaningful contextual conversation . Search engines will provide real-time opinion & experience from customers across the globe.


SEARCH TOMORROW WILL BE ‘HUMAN LIKE’

1. Evolutionary Brain:

Like humans, search engines will have an evolutionary brain that understands search behavior to learn from it. While Machine Learning is already being used by search engines, we still have a long way to go to understand & learn from ‘mass-scale’ human search patterns. Deep learning technique is fast evolving. It’s becoming increasingly more important to capture your customer’s imagination and attention with visuals, and search companies are taking notice.

2. Personal Assistant

Search engines are fast becoming personal assistant by enabling meaningful contextual conversation . E.g., When you search for the status of your flight, it tells you that your old friend is travelling in the same flight. Search engines today provides personalized responses for queries like “what’s the status of my flight”. Search engine crawling is transitioning from being web based to IoT based. Search engines are truly moving away from being information providers to becoming personal assistants. In the near future, they may very well book your flight tickets, order a pizza and more.

3. Experiential Intelligence

Search engines will provide real-time opinion & experience from customers across the globe. More & more people search online to understand real-time experience from another person. For example, how does the food taste today in a particular restaurant, traffic congestion on a busy road, etc. Search engines of the future will provide real time information on people’s experiences. It’s almost like asking a customer how does the coffee taste today before you place your order.


CHANGING THE WAY PEOPLE SEARCH CODE:

We learn code with Google:
Today’s smart engineers learn with Google. We search for code syntax, use cases, properties – and learn from it. Many a times we end up reading through irrelevant blogs & articles. As Engineers, we look for solution to solve a problem with examples of how it was done in the past & in what context a particular class was used.

Can we make code search more contextual?
So, we asked ourselves – can we make code search more contextual & relevant for Engineers with real examples of how a particular piece of code was used in the past. This opens up new opportunities to explore all possible use cases of a specific class.
We built KodeBeagle. It makes code search contextual.

Published in: Data & Analytics
  • Be the first to comment

Changing the way people search with apache spark

  1. 1. Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve. HOW APACHE SPARK IS CHANGING THE WAY PEOPLE SEARCH? INSIGHTS FROM IMAGINEA
  2. 2. My vision when we started Google 15 years ago was that eventually you wouldn't have to have a search query at all. - Sergey Brin, Google SERGEY BRIN ONCE SAID …
  3. 3. SEARCH TODAY IS PERSONAL
  4. 4. SEARCH TODAY IS CONTEXTUAL Best places to see in Goa Best places to see in Goa Search from … iPhone user, on street in NY Search from … iPhone user, on street in Goa Contextual results … Best places in Goa, flight charges to Goa, places to stay, etc. Contextual results … Places to visit with distance from your location, restaurants near-by, etc.
  5. 5. SEARCH TOMORROW WILL BE ‘HUMAN LIKE’ Experiential Intelligence Personal Assistant Evolutionary Brain
  6. 6. Experiential Intelligence Personal AssistantEvolutionary Brain Like humans, search engines will have an evolutionary brain that understands search behavior to learn from it While Machine Learning is already being used by search engines, we still have a long way to go to understand & learn from ‘mass-scale’ human search patterns. Deep learning technique is fast evolving. It’s becoming increasingly more important to capture your customer’s imagination and attention with visuals, and search companies are taking notice. Checkout Our Deep Learning Experiment >>
  7. 7. Experiential IntelligencePersonal Assistant Evolutionary Brain Search engines are fast becoming personal assistant by enabling meaningful contextual conversation E.g., When you search for the status of your flight, it tells you that your old friend is travelling in the same flight Search engines today provides personalized responses for queries like “what’s the status of my flight”. Search engine crawling is transitioning from being web based to IoT based. Search engines are truly moving away from being information providers to becoming personal assistants. In the near future, they may very well book your flight tickets, order a pizza and more. Checkout How to Crawl Apps with Deep Linking Techniques >>
  8. 8. Experiential Intelligence Personal Assistant Evolutionary Brain Search engines will provide real-time opinion & experience from customers across the globe More & more people search online to understand real-time experience from another person. For example, how does the food taste today in a particular restaurant, traffic congestion on a busy road, etc. Search engines of the future will provide real time information on people’s experiences. It’s almost like asking a customer how does the coffee taste today before you place your order.
  9. 9. Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve. CHANGING THE WAY PEOPLE SEARCH FOR CODE OUR EXPERIMENT WITH APACHE SPARK
  10. 10. We learn code with Google Today’s smart engineers learn with Google. We search for code syntax, use cases, properties – and learn from it. Many a times we end up reading through irrelevant blogs & articles. As Engineers, we look for solution to solve a problem with examples of how it was done in the past & in what context a particular class was used.
  11. 11. Can we make code search more contextual? So, we asked ourselves – can we make code search more contextual & relevant for Engineers with real examples of how a particular piece of code was used in the past. This opens up new opportunities to explore all possible use cases of a specific class. We built KodeBeagle. It makes code search contextual. Explore KodeBeagle >>
  12. 12. KodeBeagle leverages power of Apache Spark to provide intelligent code suggestion KodeBeagle shows most idiomatic usages for any given code snippet. It leverages abundantly available “standard” code library from GitHub to learn interesting and useful coding patterns. It makes code search easy using Natural Language Query technique. It summarizes new projects & files to aid quick learning. Explore KodeBeagle >>
  13. 13. Why we chose Apache Spark for contextual search? Apache Spark is the next evolutionary change in the big data processing environment as it provides batch as well as streaming capabilities, making it a preferred choice of platform for speedy analysis. We had to crawl through almost 1 billion lines of open source code from approximately 5,50,000 GitHub projects. Apache Spark provided us the processing speed along with the flexibility required to build this platform.
  14. 14. How does the platform work? Kodebeagle Crawlers HDFS Storage Spark Compute Cluster Elasticsearch Cluster KodeBeagle.com Crawlers cloning GitHub & storing to HDFS Spark processes & stores it back in HDFS 1 2 3 Elasticsearch is loaded with processed data from HDFS 4 A webserver as a load balancer and a firewall exposes the elasticsearch server to the web
  15. 15. Discovering theme using Topic Modelling technique Topic modelling consists of a set of methods that collectively aim to discover the underlying themes within a set of documents. This technique doesn’t work if one wishes to analyze a large corpus — say all the java projects on Github or all the pages of Wikipedia. To overcome this we developed many probabilistic topic models. These models, leverage the statistical properties of the underlying data to discover the themes or ‘topics’ in that data. Know More on Topic Modelling in NLP >>
  16. 16. Discovering intent using Latent Dirichlet Allocation (LDA) with Bayesian Network LDA is a generative model which allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Bayesian network is a kind of probabilistic graphical model that provides a principled way of representing and reasoning possible relationships between random variables. We leveraged these techniques to token-ize the code repo into collection of words & assign sensible values. Know more about LDA & Bayesian Networks >>
  17. 17. Discovering content using repo summary with TASSAL TASSAL is based on HIERSUM that uses an hierarchical LDA-style model. It represents content specificity as an hierarchy of topic vocabulary distribution. It produces multiple ‘topical summaries’ to facilitate content discovery and navigation. We parsed repo & built AST. We trained the Topic Model that involves running the topic sampling algorithm for multiple iterations, performing hyper- parameter optimization every k iterations. Once the model is trained, it can be applied to any repo for summarization. This method does not need any prior information of repos, as it processes information about repos and identify files which summarize the repo.
  18. 18. IN SUMMARY … Real-time streaming analytics makes experiential intelligence possible Contextual search is the future – start considering what steps to take Intent is more than just ‘keywords done better’
  19. 19. SEARCH TOMORROW WILL BE ‘HUMAN LIKE’ Experiential Intelligence Personal Assistant Evolutionary Brain
  20. 20. EXPERIENCE THE POWER OF APACHE SPARK WITH IMAGINEA  Imaginea is among the top contributors to Spark code  We have been building products on Spark since 2014  We are opensource contributors to Apache Hadoop and Zeppelin To find out more, visit http://www.imaginea.com/apache-spark
  21. 21. Disclaimer This document may contain forward-looking statements concerning products and strategies. These statements are based on management's current expectations and actual results may differ materially from those projected, as a result of certain risks, uncertainties and assumptions, including but not limited to: the growth of the markets addressed by our products and our customers' products, the demand for and market acceptance of our products; our ability to successfully compete in the markets in which we do business; our ability to successfully address the cost structure of our offerings; the ability to develop and implement new technologies and to obtain protection for the related intellectual property; and our ability to realize financial and strategic benefits of past and future transactions. These forward-looking statements are made only as of the date indicated, and the company disclaims any obligation to update or revise the information contained in any forward-looking statements, whether as a result of new information, future events or otherwise. All Trademarks and other registered marks belong to their respective owners. Copyright © 2012-2015, Imaginea Technologies, Inc. and/or its affiliates. All rights reserved. Credits Images under Creative Commons Zero license. Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve.

×