16h00 globant - aws globant-big-data_summit2012


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

16h00 globant - aws globant-big-data_summit2012

  1. 1. Big Data at GlobantSuccess Cases in AWSSabina A. Schneider
  2. 2. What is Big Data?
  3. 3. What is Data Science?
  4. 4. Data Architecture Enterprise High Information Availability Strategy and Performance NoSQL Distributed Mission Solutions Critical Product Positioning in the Market Deeper insight about your Customers Analytics and Alerts on KPIs Cross-reference data with different sources
  5. 5. Core Technologies
  6. 6. BigData Ecosystem
  7. 7. Scalable Architecture in the Cloud Mobile Devices in the cars Third Party Web App Web App Web App Integration Elastic Load Mobile Devices Balancer Auto scaling singly Web Client NoSQL DB S3 Bucket Cloudfront EMR Cluster Storm Real Time processing Hadoop Analytics Dashboard Trends Web Client Pig BigData – storage and processing
  8. 8. Metamarkets hasdeveloped a web-basedanalytics console thatsupports drill-downs androll-ups of high dimensionaldata sets (real-timebidding), comprising billionsof events, in real-time.Data store collects 10 GBof information every day,and has over 15 TB.Reports using Hadoop andHive on AWS Infrastructure.The 40-instance cluster canscan, filter, and aggregate 1billion rows in 950milliseconds.
  9. 9. Gree is a leadingcasual gamedevelopmentcompany.Globant developed aHadoop basedarchitecture to storegaming events andgenerate telemetryinformation. Thesemetrics are used toanalyze, segmentgamer profiles,estimate revenue andperform predictiveanalysis on gameperformance.
  10. 10. Products Positioningin the Market• Tweets recollection onspecific events (eg:elections), integratedwith a set ofMapReduce basedqueries• Data stored in a 20-node Hadoop cluster• Google Visualizationtools for widget basedDashboard
  11. 11. What?• Innovation to the Financial Market• Sentiment Analytics to what’s happening now and what can happen next in theMarket• Predictions one week in advance according to comments on TweeterChallenges• Aggresive Real Time analysis on Social Networks• Dashboarding comparing with real values from Yahoo Finances• Sentiment Analysis and Languague filtering• Analytics Predictions
  12. 12. Data Science Recommend ation Classification Sophisticated Mathematical algorithm Statistical Clustering Algorithm Predictions on KPIs Predictions on Metrics
  13. 13. Moneygram Transaction ScoringAnalysis of Moneygram historical transactional data labeled as Fraudulent/Non Fraudulent • 8 years of transactional data to analyzeTraining using Support Vector Machines of historical data • Classification achieved by using only a subset of data using soft margins (by use of slack variables) to construct dividing hyperplane • Possible use of kernel principal components to preprocess data and reduce dimensionality of training dataset • Avoid high computation times (sparse solution)Benefits • Detect fraudulent transactions with a higher level of accuracy • Increase in customer service satisfaction (less false-positives)
  14. 14. Shopping cart suggestion engineGenerate suggestions based on client shopping history• Cluster a large dataset representing clients shopping history usingunsupervised learning algorithms.• Use information from new/existing client to classify into the clusterizedshopping history from ALL clients.• Generate suggestions based on the clusters shopping preferences• Use of Hadoop and Mahout for clustering and posterior classification
  15. 15. • Metadata word clustering using Solr• Content management and information sorting/ categorization classified by location. Enhance the performance at a view level.• Indexing of jwt content coming from different sources (internal and external) developed with Solr on Lucene. Integration with myJwt.com: internal social network. • organize the content storage: service running in the Cloud that receives content, generate different assets (snapshot, thumbnails), extract metadata to be centralized in one place • myIdeas: collect ideas from different creative designers from different location and share a bonus between the bright ideas
  16. 16. Data Visualization Our data visualization practice allows our customers to understand the evolution of key business drivers, trends, and drill down into the root causes of deviations. Our HTML5 data visualization solution, allows us to combine the flexibility of a custom made solution with a fast time to market. It’s based in standard Widgets, allowing each user to customize the dashboard as required, and visualize it on every device.
  17. 17. Big Data Visualization Framework
  18. 18. Cloud server Browser User input Video streaming
  19. 19. Kantar Media manages TV Advertisement displayed on DirecTV US.We developed the addressable advertisement reporting solution, used by advertisers to plan and analyze theperformance of addressable advertisement.Advertisement displayed on TV is customized to each user profile. The solution allows obtaining reliablemeasurements from TV, analyzes the structure of the audience that has watched each advertisement, andallows evaluating the ROI of the marketing campaign.
  20. 20. Touch screen basedscorecard, used bythe top managementto analyze andcompare results fromdifferent countriesand products.
  21. 21. Thank you!