Submit Search
Upload
Facebook Analytics with Elastic Map/Reduce
•
8 likes
•
2,377 views
J Singh
Follow
A workshop on analyzing data about Facebook likes of a set of people
Read less
Read more
Technology
Report
Share
Report
Share
1 of 24
Recommended
Big Data Laboratory
Big Data Laboratory
J Singh
The Hadoop Ecosystem
The Hadoop Ecosystem
J Singh
OpenLSH - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
J Singh
Hadoop ecosystem
Hadoop ecosystem
Ran Silberman
Future of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
Milind Bhandarkar
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Milind Bhandarkar
Introduction to Pig
Introduction to Pig
Prashanth Babu
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
Milind Bhandarkar
Recommended
Big Data Laboratory
Big Data Laboratory
J Singh
The Hadoop Ecosystem
The Hadoop Ecosystem
J Singh
OpenLSH - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
J Singh
Hadoop ecosystem
Hadoop ecosystem
Ran Silberman
Future of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
Milind Bhandarkar
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Milind Bhandarkar
Introduction to Pig
Introduction to Pig
Prashanth Babu
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
Milind Bhandarkar
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Summary machine learning and model deployment
Summary machine learning and model deployment
Novita Sari
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Hadoop Primer
Hadoop Primer
Steve Staso
Nextag talk
Nextag talk
Joydeep Sen Sarma
Functional Programming and Big Data
Functional Programming and Big Data
DataWorks Summit
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Uwe Printz
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
Drilling into Data with Apache Drill
Drilling into Data with Apache Drill
DataWorks Summit
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Jonathan Seidman
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Yahoo Developer Network
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
obdit
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Yahoo Developer Network
Introduction to Elastic MapReduce
Introduction to Elastic MapReduce
Amazon Web Services
More Related Content
What's hot
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Summary machine learning and model deployment
Summary machine learning and model deployment
Novita Sari
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Hadoop Primer
Hadoop Primer
Steve Staso
Nextag talk
Nextag talk
Joydeep Sen Sarma
Functional Programming and Big Data
Functional Programming and Big Data
DataWorks Summit
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Uwe Printz
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
Drilling into Data with Apache Drill
Drilling into Data with Apache Drill
DataWorks Summit
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Jonathan Seidman
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Yahoo Developer Network
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
obdit
What's hot
(20)
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Hadoop Ecosystem
Hadoop Ecosystem
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Apache Tez – Present and Future
Apache Tez – Present and Future
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Summary machine learning and model deployment
Summary machine learning and model deployment
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Pig, Making Hadoop Easy
Pig, Making Hadoop Easy
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Hadoop Primer
Hadoop Primer
Nextag talk
Nextag talk
Functional Programming and Big Data
Functional Programming and Big Data
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Drilling into Data with Apache Drill
Drilling into Data with Apache Drill
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
Viewers also liked
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Yahoo Developer Network
Introduction to Elastic MapReduce
Introduction to Elastic MapReduce
Amazon Web Services
MapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
Israel AWS User Group
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
Yahoo Developer Network
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...
Yahoo Developer Network
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Web Services
Mapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
Amazon Web Services
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
Lilia Sfaxi
BigData_TP3 : Spark
BigData_TP3 : Spark
Lilia Sfaxi
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
SmartNews, Inc.
Hadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
MapReduce in Simple Terms
MapReduce in Simple Terms
Saliya Ekanayake
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Bigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
Grisha Weintraub
Slideshare Powerpoint presentation
Slideshare Powerpoint presentation
elliehood
Viewers also liked
(20)
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Introduction to Elastic MapReduce
Introduction to Elastic MapReduce
MapReduce Paradigm
MapReduce Paradigm
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Mapreduce Algorithms
Mapreduce Algorithms
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
BigData_TP3 : Spark
BigData_TP3 : Spark
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
Hadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
MapReduce in Simple Terms
MapReduce in Simple Terms
Big data, map reduce and beyond
Big data, map reduce and beyond
Bigtable and Dynamo
Bigtable and Dynamo
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
Slideshare Powerpoint presentation
Slideshare Powerpoint presentation
Similar to Facebook Analytics with Elastic Map/Reduce
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
Chris Shenton
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Chris Shenton
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
Mike Broberg
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
asya999
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
eHarmony in the Cloud
eHarmony in the Cloud
Craig Dickson
Shop talk - Project Server 2013
Shop talk - Project Server 2013
Chris Givens
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
SharePoint Saturday - Chicago - 2014 - Decoding the Business Intelligence Alp...
SharePoint Saturday - Chicago - 2014 - Decoding the Business Intelligence Alp...
Scott_Brickey
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applications
Digital Illustrated
Dax & sql in power bi
Dax & sql in power bi
Berkovich Consulting
L19 Application Architecture
L19 Application Architecture
Ólafur Andri Ragnarsson
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
MongoDB
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
Tableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My Life
Russell Spangler
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Nilesh Shah
EMR and DynamoDB
EMR and DynamoDB
Sohail M. Khan
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
PlanetData Network of Excellence
SQL Saturday Columbus 2014 PowerBI with SQL Excel and SharePoint
SQL Saturday Columbus 2014 PowerBI with SQL Excel and SharePoint
Scott_Brickey
Similar to Facebook Analytics with Elastic Map/Reduce
(20)
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
eHarmony in the Cloud
eHarmony in the Cloud
Shop talk - Project Server 2013
Shop talk - Project Server 2013
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
SharePoint Saturday - Chicago - 2014 - Decoding the Business Intelligence Alp...
SharePoint Saturday - Chicago - 2014 - Decoding the Business Intelligence Alp...
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applications
Dax & sql in power bi
Dax & sql in power bi
L19 Application Architecture
L19 Application Architecture
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Tableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My Life
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
EMR and DynamoDB
EMR and DynamoDB
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
SQL Saturday Columbus 2014 PowerBI with SQL Excel and SharePoint
SQL Saturday Columbus 2014 PowerBI with SQL Excel and SharePoint
More from J Singh
Designing analytics for big data
Designing analytics for big data
J Singh
Open LSH - september 2014 update
Open LSH - september 2014 update
J Singh
PaaS - google app engine
PaaS - google app engine
J Singh
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
J Singh
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
J Singh
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
J Singh
High Throughput Data Analysis
High Throughput Data Analysis
J Singh
NoSQL and MapReduce
NoSQL and MapReduce
J Singh
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
J Singh
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
J Singh
CS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
CS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
J Singh
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
CS 542 Database Index Structures
CS 542 Database Index Structures
J Singh
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
J Singh
CS 542 Overview of query processing
CS 542 Overview of query processing
J Singh
CS 542 Introduction
CS 542 Introduction
J Singh
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's Viewpoint
J Singh
More from J Singh
(19)
Designing analytics for big data
Designing analytics for big data
Open LSH - september 2014 update
Open LSH - september 2014 update
PaaS - google app engine
PaaS - google app engine
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
High Throughput Data Analysis
High Throughput Data Analysis
NoSQL and MapReduce
NoSQL and MapReduce
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Query Optimization
CS 542 -- Query Optimization
CS 542 -- Query Execution
CS 542 -- Query Execution
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Database Index Structures
CS 542 Database Index Structures
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
CS 542 Overview of query processing
CS 542 Overview of query processing
CS 542 Introduction
CS 542 Introduction
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's Viewpoint
Recently uploaded
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Neo4j
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
null - The Open Security Community
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
Recently uploaded
(20)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
Facebook Analytics with Elastic Map/Reduce
1.
Data + Algorithms
= Knowledge Facebook Analytics With Elastic Map/Reduce – a Hands-on Workshop November 12, 2012 J Singh, DataThinks.org 1
2.
Take-away Messages • Map
Reduce is simple, Hadoop is one implementation of MR… – …made even simpler by services like Elastic Map Reduce • But Map Reduce requires a different style of programming… – …and a different set of techniques for debugging • Facebook data can get big very quickly… – …and storage and bandwidth costs can dominate your solution • Analytics is an iterative (agile) process… – …each iteration requires evaluating results, and tuning the algorithms, possibly the acquisition of more data © J Singh, 2012 2 2
3.
Signing Up for
AWS The steps required to obtain an AWS account Create an AWS account (http://aws.amazon.com). – http://www.slideshare.net/AmazonWebServices/video-how-to-sign-up-for- amazon-web-services-8700872 – Requires a valid credit card and a phone based identification. Sign in to the AWS Management Console – http://aws.amazon.com/console © J Singh, 2012 3 3
4.
Elastic Map Reduce
Resources • Summary of the offering • Elastic MapReduce Training • Getting Started Guide • Developers Guide © J Singh, 2012 4 4
5.
MapReduce Conceptual Underpinnings •
Based on Functional Programming model – From Lisp • (map square '(1 2 3 4)) (1 4 9 16) • (reduce plus '(1 4 9 16)) 30 – From APL • +/ N N 1 2 3 4 • Easy to distribute (based on each element of the vector) • New for Map/Reduce: Nice failure/retry semantics – Hundreds and thousands of low-end servers are running at the same time © J Singh, 2012 5 5
6.
MapReduce Flow
© J Singh, 2012 6 6
7.
Elastic Map Reduce
– Summary • Hadoop installed and maintained by Amazon – We can focus on programming – Offers a few options on map and reduce programs • Streaming – Map and Reduce programs connect through stdin and stdout – Allows Map and Reduce to be written in any language • Hive, Pig – Translates to Map/Reduce JARs – Can cascade M/R pipelines • Custom JAR – for special cases © J Singh, 2012 7 7
8.
Elastic Map Reduce
– Architecture • Starting with data in S3 • EMR Service initiates the job • Hadoop Master coordinates operation • Slave nodes are initiated and data loaded into them • Extra nodes can be invoked if needed • Results are copied back into S3 – Nodes are destroyed © J Singh, 2012 8 8
9.
Elastic Map Reduce
– Word Count • Use the AWS Management Console >> Elastic MapReduce – Define Job Flow • Hadoop Version 1.0.3 • Run your own application – Steaming – Specify Parameters • For input files, elasticmapreduce/samples/wordcount/input • For output files, you need to define your own S3 bucket – In a separate browser tab, AWS Management Console >> S3 – Bucket names can include lowercase letters, numbers, period, dash • Mapper code can be seen at http://goo.gl/EbCme – Copy this code to one of your buckets – Specify path <your-bucket>/wordSplitter.py © J Singh, 2012 9 9
10.
Elastic Map Reduce
– Word Count (p2) • Configure EC2 Instances • Advanced Options – Optional: Amazon EC2 Key Pair • To log into the master and make changes to a running job – E.g,, add extra nodes to speed up processing – Amazon S3 Log Path • <your-bucket>/log-2012-11-12--19-30 • Accept all other defaults and go! © J Singh, 2012 10 10
11.
Monitoring Operation • AWS
Management Console provides a view into the operation – These screen-shots were taken at minute 27 of a 30-minute run – Configuration default in this case was for 2 map slots – First slot became available at 12:00, second around 12:10 © J Singh, 2012 11 11
12.
Elastic Map Reduce
– Debugging • AWS console and the log files provide clues on what went wrong and how to fix it • Make a change that will break the operation and examine the AWS console to find the error you introduced – Introduce a parsing error in the mapper program – Uncomment these lines to have it raise an exception import random x = 1 / random.randint(0,1000) – Save the file to an S3 bucket and run – Can you find where EMR reveals what happened? © J Singh, 2012 12 12
13.
Facebook Analytics –
Summary • Extend the architecture – Import Facebook data into S3 – Change Map Reduce programs as required © J Singh, 2012 13 13
14.
Facebook Analytics –
Observations • Fetching and staging data is the real challenge in putting together an analytics solution – For unstructured data, it requires • An understanding of the data model at the source • Custom code to read it – For structured data, consider Pig/Hive (higher-level Hadoop components) • Pig/Hive can read/write tables formatted as CSV/TSV files in S3 – Either we need to bring files into S3 – Or point Pig/Hive at a JDBC connection • An opportunity to rethink the ETL pipeline? © J Singh, 2012 14 14
15.
Facebook Analytics –
Data Collection • The exercise is based on everyone‟s Facebook data • Log into http://apps.facebook.com/map-reduce-workshop – Requires permission to get • Information about you, • Your friends, • Your likes, your friends‟ likes. – Randomly selects 10 of those friends – Randomly selects 25 of their likes – Anonymizes your friends‟ Facebook IDs before storing into S3 • All data, even though opaque, will be deleted at the end of the workshop © J Singh, 2012 15 15
16.
Facebook Analytics –
Data Collected Original = 75 Friends = 750 Likes = up to about 20,000 • Each user record shows anonymized user ID and their likes – 4110002004281 ['21506845769', '345722385482735', '93433060687'] © J Singh, 2012 16 16
17.
Facebook Analytics –
Likes Count • Use the AWS Management Console >> Elastic MapReduce – Define Job Flow • Hadoop Version 1.0.3 • Run Your Own Application – Streaming – Specify Parameters • For input files, use bucket datathinks-users • For output files, you need to define your own S3 bucket – In a separate browser tab, AWS Management Console >> S3 • Mapper: copy goo.gl/PcLK4 into a bucket you own – Advanced options: • Choose a fresh log file location – Accept all other defaults and go! © J Singh, 2012 17 17
18.
Viewing the Results •
The results of Data Analysis are available in S3. – Partial example: 139784736075551 1 140413412750046 6 184331976202 3 220854914702193 1 29092950651 1 • How to interpret the results. – Sort by frequency, then examine most frequent likes • 140413412750046 is cryptic • But http://www.facebook.com/pages/w/140413412750046 reveals what it is (DataThinks) • Requires further action: what to do with the results? © J Singh, 2012 18 18
19.
Algorithm Discussion • The
algorithm based on exact matches for likes may be too restrictive – „Ella Fitzgerald‟ != „Duke Ellington‟ – But people who like Ella Fitzgerald may be reachable the same way as people who like Duke Ellington – An idea to explore further: • Is there a way to find ID‟s that we might consider equivalent? © J Singh, 2012 19 19
20.
Data Collected and
Embellished Original = 75 Friends = 750 Likes = 15,000 Similar Likes = 150,000 © J Singh, 2012 20 20
21.
Extended Facebook Analytics
– Summary • Extend the architecture – Get mappers to fetch “similar likes” from the internet © J Singh, 2012 21 21
22.
Facebook Analytics –
Showing Results • The other challenge in putting together an analytics solution is displaying results – Demo of our results page © J Singh, 2012 22 22
23.
Take-away Messages • Map
Reduce is simple, Hadoop is one implementation of MR… – …made even simpler by services like Elastic Map Reduce • But Map Reduce requires a different style of programming… – …and a different set of techniques for debugging • Facebook data can get big very quickly… – …and storage and bandwidth costs can dominate your solution • Analytics is an iterative (agile) process… – …each iteration requires evaluating results, and tuning the algorithms, possibly the acquisition of more data © J Singh, 2012 23 23
24.
Thank you • J
Singh – President, Early Stage IT • Technology Services and Strategy for Startups • DataThinks.org is a service of Early Stage IT – “Big Data” analytics solutions © J Singh, 2012 24 24
Editor's Notes
Get started with Hadoop
Get started with Hadoop