SlideShare a Scribd company logo
1 of 23
Big Data at Globant
Success Cases in AWS
Sabina A. Schneider
What is Big Data?
What is Data Science?
Data Architecture                  Enterprise                  High
                                  Information               Availability
                                    Strategy                   and
                                                           Performance
                     NoSQL
                    Distributed                 Mission
                    Solutions                   Critical




                        Product Positioning in the Market

                    Deeper insight about your Customers

                            Analytics and Alerts on KPIs

                Cross-reference data with different sources
Core Technologies
BigData Ecosystem
Scalable Architecture in the Cloud

 Mobile Devices in
     the cars

                                                                                                                            Third Party
                                                   Web App         Web App              Web App
                                                                                                                            Integration


                     Elastic Load
  Mobile Devices      Balancer
                                                                Auto scaling singly




   Web Client

                                    NoSQL DB   S3 Bucket    Cloudfront    EMR Cluster               Storm
                                                                                                  Real Time
                                                                                                  processing


                                                       Hadoop

                                                                                                   Analytics
                                                                                                   Dashboard

                                                       Trends                                                  Web Client
                                                                         Pig

                                               BigData – storage and processing
Metamarkets                 has
developed a web-based
analytics     console       that
supports drill-downs and
roll-ups of high dimensional
data      sets       (real-time
bidding), comprising billions
of events, in real-time.

Data store collects 10 GB
of information every day,
and has over 15 TB.

Reports using Hadoop and
Hive on AWS Infrastructure.

The 40-instance cluster can
scan, filter, and aggregate 1
billion    rows     in   950
milliseconds.
Gree is a leading
casual           game
development
company.
Globant developed a
Hadoop           based
architecture to store
gaming events and
generate     telemetry
information.     These
metrics are used to
analyze,      segment
gamer          profiles,
estimate revenue and
perform      predictive
analysis on game
performance.
Products Positioning
in the Market
• Tweets recollection on
specific events (eg:
elections), integrated
with a set of
MapReduce based
queries

• Data stored in a 20-
node Hadoop cluster


• Google Visualization
tools for widget based
Dashboard
What?
• Innovation to the Financial Market
• Sentiment Analytics to what’s happening now and what can happen next in the
Market
• Predictions one week in advance according to comments on Tweeter


Challenges
• Aggresive Real Time analysis on Social Networks
• Dashboarding comparing with real values from Yahoo Finances
• Sentiment Analysis and Languague filtering
• Analytics Predictions
Data Science
                                  Recommend
                                     ation             Classification

               Sophisticated
               Mathematical
                algorithm

                                         Statistical
                                                                    Clustering
                                         Algorithm




                                Predictions on KPIs

                               Predictions on Metrics
Moneygram Transaction Scoring
Analysis of Moneygram historical transactional data labeled as Fraudulent/Non Fraudulent

     • 8 years of transactional data to analyze

Training using Support Vector Machines of historical data

     • Classification achieved by using only a subset of data using soft margins (by use of slack
     variables) to construct dividing hyperplane
     • Possible use of kernel principal components to preprocess data and reduce dimensionality of
     training dataset
     • Avoid high computation times (sparse solution)

Benefits
    • Detect fraudulent transactions with a higher level of accuracy
    • Increase in customer service satisfaction (less false-positives)
Shopping cart suggestion engine
Generate suggestions based on client shopping history

• Cluster a large dataset representing clients' shopping history using
unsupervised learning algorithms.

• Use information from new/existing client to classify into the clusterized
shopping history from ALL clients.

• Generate suggestions based on the cluster's shopping preferences

• Use of Hadoop and Mahout for clustering and posterior classification
•   Metadata word clustering using Solr

•   Content management and information sorting/ categorization classified by location.
    Enhance the performance at a view level.

•   Indexing of jwt content coming from different sources (internal and external) developed
    with Solr on Lucene. Integration with myJwt.com: internal social network.

      •   organize the content storage: service running in the Cloud that receives content,
          generate different assets (snapshot, thumbnails), extract metadata to be
          centralized in one place
      •   myIdeas: collect ideas from different creative designers from different location
          and share a bonus between the bright ideas
Data Visualization
                     Our data visualization practice allows our customers to understand
                     the evolution of key business drivers, trends, and drill down into the
                     root causes of deviations.

                     Our HTML5 data visualization solution, allows us to combine the
                     flexibility of a custom made solution with a fast time to market. It’s
                     based in standard Widgets, allowing each user to customize the
                     dashboard as required, and visualize it on every device.
Big Data Visualization Framework
Cloud server                     Browser
                 User input

               Video streaming
Kantar Media manages TV Advertisement displayed on DirecTV US.
We developed the addressable advertisement reporting solution, used by advertisers to plan and analyze the
performance of addressable advertisement.
Advertisement displayed on TV is customized to each user profile. The solution allows obtaining reliable
measurements from TV, analyzes the structure of the audience that has watched each advertisement, and
allows evaluating the ROI of the marketing campaign.
Touch screen based
scorecard, used by
the top management
to analyze and
compare results from
different countries
and products.
Thank you!

More Related Content

What's hot

EOH Analytics Offering
EOH Analytics OfferingEOH Analytics Offering
EOH Analytics Offering
alliekhan
 

What's hot (20)

Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Three Deep Web Analytics Wednesday
Three Deep Web Analytics WednesdayThree Deep Web Analytics Wednesday
Three Deep Web Analytics Wednesday
 
Big data analytics use cases: all you need to know
Big data analytics use cases:  all you need to knowBig data analytics use cases:  all you need to know
Big data analytics use cases: all you need to know
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
Graph + AI World 2020: Opening Day Keynote
Graph + AI World 2020: Opening Day KeynoteGraph + AI World 2020: Opening Day Keynote
Graph + AI World 2020: Opening Day Keynote
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)
 
Callcenter HPE IDOL overview
Callcenter HPE IDOL overviewCallcenter HPE IDOL overview
Callcenter HPE IDOL overview
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
Mind Blowing Business Intelligence Dashboards
Mind Blowing Business Intelligence DashboardsMind Blowing Business Intelligence Dashboards
Mind Blowing Business Intelligence Dashboards
 
Making Money With Big Data
Making Money With Big DataMaking Money With Big Data
Making Money With Big Data
 
EOH Analytics Offering
EOH Analytics OfferingEOH Analytics Offering
EOH Analytics Offering
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Fraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph LearningFraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph Learning
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Location Intelligence - The where factor
Location Intelligence - The where factorLocation Intelligence - The where factor
Location Intelligence - The where factor
 
Denodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and Exploration
 
Is your data paying you dividends?
Is your data paying you dividends? Is your data paying you dividends?
Is your data paying you dividends?
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital Transformation
 

Viewers also liked

Sketch You Can!
Sketch You Can!Sketch You Can!
Sketch You Can!
Jeremy Kriegel
 
Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016
Tony Barber
 

Viewers also liked (15)

Nemes-Nagy Katalin Erika: Ezt főztük ki!
Nemes-Nagy Katalin Erika: Ezt főztük ki!Nemes-Nagy Katalin Erika: Ezt főztük ki!
Nemes-Nagy Katalin Erika: Ezt főztük ki!
 
Goebbels, joseph fuhrerr
Goebbels, joseph   fuhrerrGoebbels, joseph   fuhrerr
Goebbels, joseph fuhrerr
 
Hajdicsné Varga Katalin: A gépírástanulás eredményességének értékelése tanuló...
Hajdicsné Varga Katalin: A gépírástanulás eredményességének értékelése tanuló...Hajdicsné Varga Katalin: A gépírástanulás eredményességének értékelése tanuló...
Hajdicsné Varga Katalin: A gépírástanulás eredményességének értékelése tanuló...
 
Uu praktik kedokteran
Uu praktik kedokteranUu praktik kedokteran
Uu praktik kedokteran
 
Guia de ciencias n 1 periodo grado 2°
Guia de ciencias n 1 periodo grado 2°Guia de ciencias n 1 periodo grado 2°
Guia de ciencias n 1 periodo grado 2°
 
Sápi Vivien: Okostelefonok és applikációk legalizálása a középiskolai oktatás...
Sápi Vivien: Okostelefonok és applikációk legalizálása a középiskolai oktatás...Sápi Vivien: Okostelefonok és applikációk legalizálása a középiskolai oktatás...
Sápi Vivien: Okostelefonok és applikációk legalizálása a középiskolai oktatás...
 
Sketch You Can!
Sketch You Can!Sketch You Can!
Sketch You Can!
 
Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016
 
Risk in the food supply chain
Risk in the food supply chainRisk in the food supply chain
Risk in the food supply chain
 
Józsa Gabriella: zanza.tv
Józsa Gabriella: zanza.tvJózsa Gabriella: zanza.tv
Józsa Gabriella: zanza.tv
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Android Booting Sequence
Android Booting SequenceAndroid Booting Sequence
Android Booting Sequence
 
Announcing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck TalksAnnouncing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck Talks
 
Filosofia medieval
Filosofia medievalFilosofia medieval
Filosofia medieval
 
Test Environment Management
Test Environment ManagementTest Environment Management
Test Environment Management
 

Similar to 16h00 globant - aws globant-big-data_summit2012

Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real Time
BigDataExpo
 

Similar to 16h00 globant - aws globant-big-data_summit2012 (20)

Evolving analytics at ebay - 2012 Tableau Customer Conference
Evolving analytics at ebay - 2012 Tableau Customer ConferenceEvolving analytics at ebay - 2012 Tableau Customer Conference
Evolving analytics at ebay - 2012 Tableau Customer Conference
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud ModernizationDenodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
 
Denodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechDenodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes Logitech
 
Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real Time
 
Next-Gen Cloud Analytics with AWS, Big Data and Data Virtualization
Next-Gen Cloud Analytics with AWS, Big Data and Data VirtualizationNext-Gen Cloud Analytics with AWS, Big Data and Data Virtualization
Next-Gen Cloud Analytics with AWS, Big Data and Data Virtualization
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
MindSphere: The cloud-based, open IoT operating system. Damiano ManocchiaMindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Mining Information from Data on Cloud
Mining Information from Data on CloudMining Information from Data on Cloud
Mining Information from Data on Cloud
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 

More from infolive

Projeto Exame Forum Virtual 3.0 v2
Projeto Exame Forum Virtual 3.0 v2Projeto Exame Forum Virtual 3.0 v2
Projeto Exame Forum Virtual 3.0 v2
infolive
 
17h30 aws-databases-summit
17h30   aws-databases-summit17h30   aws-databases-summit
17h30 aws-databases-summit
infolive
 
16h30 aws gru security deck
16h30   aws gru security deck16h30   aws gru security deck
16h30 aws gru security deck
infolive
 
15h00 intel - intel big data for aws summits rev3
15h00   intel - intel big data for aws summits rev315h00   intel - intel big data for aws summits rev3
15h00 intel - intel big data for aws summits rev3
infolive
 
14h00 aws costoptimization_jvaria
14h00 aws costoptimization_jvaria14h00 aws costoptimization_jvaria
14h00 aws costoptimization_jvaria
infolive
 
13h00 aws 2012-fault_tolerant_applications
13h00   aws 2012-fault_tolerant_applications13h00   aws 2012-fault_tolerant_applications
13h00 aws 2012-fault_tolerant_applications
infolive
 
Keynote aws summit 2012 final
Keynote aws summit 2012 finalKeynote aws summit 2012 final
Keynote aws summit 2012 final
infolive
 
Infolive apresentação 2012
Infolive apresentação 2012Infolive apresentação 2012
Infolive apresentação 2012
infolive
 

More from infolive (8)

Projeto Exame Forum Virtual 3.0 v2
Projeto Exame Forum Virtual 3.0 v2Projeto Exame Forum Virtual 3.0 v2
Projeto Exame Forum Virtual 3.0 v2
 
17h30 aws-databases-summit
17h30   aws-databases-summit17h30   aws-databases-summit
17h30 aws-databases-summit
 
16h30 aws gru security deck
16h30   aws gru security deck16h30   aws gru security deck
16h30 aws gru security deck
 
15h00 intel - intel big data for aws summits rev3
15h00   intel - intel big data for aws summits rev315h00   intel - intel big data for aws summits rev3
15h00 intel - intel big data for aws summits rev3
 
14h00 aws costoptimization_jvaria
14h00 aws costoptimization_jvaria14h00 aws costoptimization_jvaria
14h00 aws costoptimization_jvaria
 
13h00 aws 2012-fault_tolerant_applications
13h00   aws 2012-fault_tolerant_applications13h00   aws 2012-fault_tolerant_applications
13h00 aws 2012-fault_tolerant_applications
 
Keynote aws summit 2012 final
Keynote aws summit 2012 finalKeynote aws summit 2012 final
Keynote aws summit 2012 final
 
Infolive apresentação 2012
Infolive apresentação 2012Infolive apresentação 2012
Infolive apresentação 2012
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

16h00 globant - aws globant-big-data_summit2012

  • 1. Big Data at Globant Success Cases in AWS Sabina A. Schneider
  • 2. What is Big Data?
  • 3. What is Data Science?
  • 4. Data Architecture Enterprise High Information Availability Strategy and Performance NoSQL Distributed Mission Solutions Critical Product Positioning in the Market Deeper insight about your Customers Analytics and Alerts on KPIs Cross-reference data with different sources
  • 7. Scalable Architecture in the Cloud Mobile Devices in the cars Third Party Web App Web App Web App Integration Elastic Load Mobile Devices Balancer Auto scaling singly Web Client NoSQL DB S3 Bucket Cloudfront EMR Cluster Storm Real Time processing Hadoop Analytics Dashboard Trends Web Client Pig BigData – storage and processing
  • 8. Metamarkets has developed a web-based analytics console that supports drill-downs and roll-ups of high dimensional data sets (real-time bidding), comprising billions of events, in real-time. Data store collects 10 GB of information every day, and has over 15 TB. Reports using Hadoop and Hive on AWS Infrastructure. The 40-instance cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.
  • 9. Gree is a leading casual game development company. Globant developed a Hadoop based architecture to store gaming events and generate telemetry information. These metrics are used to analyze, segment gamer profiles, estimate revenue and perform predictive analysis on game performance.
  • 10. Products Positioning in the Market • Tweets recollection on specific events (eg: elections), integrated with a set of MapReduce based queries • Data stored in a 20- node Hadoop cluster • Google Visualization tools for widget based Dashboard
  • 11. What? • Innovation to the Financial Market • Sentiment Analytics to what’s happening now and what can happen next in the Market • Predictions one week in advance according to comments on Tweeter Challenges • Aggresive Real Time analysis on Social Networks • Dashboarding comparing with real values from Yahoo Finances • Sentiment Analysis and Languague filtering • Analytics Predictions
  • 12. Data Science Recommend ation Classification Sophisticated Mathematical algorithm Statistical Clustering Algorithm Predictions on KPIs Predictions on Metrics
  • 13. Moneygram Transaction Scoring Analysis of Moneygram historical transactional data labeled as Fraudulent/Non Fraudulent • 8 years of transactional data to analyze Training using Support Vector Machines of historical data • Classification achieved by using only a subset of data using soft margins (by use of slack variables) to construct dividing hyperplane • Possible use of kernel principal components to preprocess data and reduce dimensionality of training dataset • Avoid high computation times (sparse solution) Benefits • Detect fraudulent transactions with a higher level of accuracy • Increase in customer service satisfaction (less false-positives)
  • 14. Shopping cart suggestion engine Generate suggestions based on client shopping history • Cluster a large dataset representing clients' shopping history using unsupervised learning algorithms. • Use information from new/existing client to classify into the clusterized shopping history from ALL clients. • Generate suggestions based on the cluster's shopping preferences • Use of Hadoop and Mahout for clustering and posterior classification
  • 15. Metadata word clustering using Solr • Content management and information sorting/ categorization classified by location. Enhance the performance at a view level. • Indexing of jwt content coming from different sources (internal and external) developed with Solr on Lucene. Integration with myJwt.com: internal social network. • organize the content storage: service running in the Cloud that receives content, generate different assets (snapshot, thumbnails), extract metadata to be centralized in one place • myIdeas: collect ideas from different creative designers from different location and share a bonus between the bright ideas
  • 16. Data Visualization Our data visualization practice allows our customers to understand the evolution of key business drivers, trends, and drill down into the root causes of deviations. Our HTML5 data visualization solution, allows us to combine the flexibility of a custom made solution with a fast time to market. It’s based in standard Widgets, allowing each user to customize the dashboard as required, and visualize it on every device.
  • 18. Cloud server Browser User input Video streaming
  • 19.
  • 20. Kantar Media manages TV Advertisement displayed on DirecTV US. We developed the addressable advertisement reporting solution, used by advertisers to plan and analyze the performance of addressable advertisement. Advertisement displayed on TV is customized to each user profile. The solution allows obtaining reliable measurements from TV, analyzes the structure of the audience that has watched each advertisement, and allows evaluating the ROI of the marketing campaign.
  • 21.
  • 22. Touch screen based scorecard, used by the top management to analyze and compare results from different countries and products.