SlideShare a Scribd company logo
Welcome to the Birmingham Big Data Science Group (BIDS) Faizan Javed 5/25/2011 Intermark Group Sponsor: Intermark Group
BIDS Stats Founded April 10, 2011  9 members (and counting..) Founder: Faizan Javed, Co-Founder: QasimIjaz Online presence: Meetup.com for co-ordinatingmeetups: http://www.meetup.com/bham-bids Also on (for related articles and announcements): LinkedIn: http://www.linkedin.com/groups/Birmingham-Big-Data-Science-Group-3865219 Facebook:http://www.facebook.com/home.php?sk=group_202221519811444
Agenda What is Big Data? Quick overview of related technologies: Large-scale distributed systems and platforms NoSQL data stores         Intelligent algorithms/web-mining/information    retrieval techniques 	Highly-scalable systems
What is Big Data? More people connected to the internet Social media explosion (Web 2.0): Facebook, Twitter, etc. Huge volumes of data being collected: sensors, mobile devices, machine-to-machine communications, social media and retail sites web logs for browsing patterns “Big” in Big Data is relative:  today's "big" is certainly tomorrow's "medium" and next week's "small.“ “Big Data" is when the size of the data itself becomes part of the problem. Going from Gigabytes to Petabytes!http://radar.oreilly.com/2010/06/what-is-data-science.html
Big Data, Big Numbers  McKinsey report, May 2011: http://www.mckinsey.com/mgi/publications/big_data/index.asp
Why care about big data? Deep analysis of data can be a competitive advantage. More data  easier to find consistent patterns More data usually beats better algorithms Ex 1: Predict customer preferences and target ads on an ecommerce website. Ex 2: Improve search quality. Ex 3: Bank risk modeling (aggregate customer activity from different lines of businesses) http://blog.mikepearce.net/2010/08/18/10-hadoop-able-problems-a-summary/ http://www.ft.com/intl/cms/s/0/64095dba-7cd5-11e0-994d-00144feabdc0.html#axzz1NHn8icSC Key point: “Many different sources” & “unstructured data”
Big Players on the Big Data Scene The Government http://us1.campaign-archive1.com/?u=4cb4c08d876d7481bbc4bc70f&id=6889126aef
The need for new techniques Traditional “relational” techniques breakdown at scale.  Solutions: NoSQL databases: Cassandra, Hbase, Riak, etc Large-scale “commodity” scale-out distributed computing techniques: MapReduce/Hadoop, Percolator, etc Analytics platforms: IBM BigInsight, EMC GreenPlum
The NoSQL revolutionhttp://www.infoq.com/news/2011/04/newsql
Prominent NoSQL database users Cassandra: Facebook, Twitter, Rackspace, Reddit, Digg.com Riak: Mozilla, Ask.com, Comcast Voldemort: LinkedIn MongoDB: Foursquare, Etsy, bit.ly, Intuit Hbase: Stumbleupon, Twitter, Infolinks, Adobe, Meetup.com,
Hadoop-based SMAQ stackhttp://radar.oreilly.com/2010/09/the-smaq-stack-for-big-data.html public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>  {  public void reduce(Text key, Iterable<IntWritable> values, Context context)  throws IOException, InterruptedException { int sum = 0;        for (IntWritableval : values)          {               sum += val.get();            }  context.write(key, new IntWritable(sum)); }  }
Hadoop-based SMAQ stack Hadoop comes with HDFS – Hadoop Distributed File Sytem. Can be used alongside various NoSQL systems (Hbase most common)
Hadoop-based SMAQ stack Pig (yahoo) input = LOAD 'input/sentences.txt' USING TextLoader();  	words = FOREACH input GENERATE FLATTEN(TOKENIZE($0)); grouped = GROUP words BY $0;   	counts = FOREACH grouped GENERATE group, COUNT(words); ordered = ORDER counts BY $0; STORE ordered INTO 'output/wordCount' USING PigStorage(); Hive (facebook)  INSERT OVERWRITE TABLE xyz_com_page_views SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' AND page_views.referrer_url like '%xyz.com';
Next-generation systems: going beyond MapReduce/Hadoophttp://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html Mostly Google and Yahoo innovations. Percolator – “real-time” MapReduce. Powers Google Instant. Dremel – superfast “Hive” to interact with large-datasets. Inhouse-Google. Pregel– highly efficient graph computing for analyzing social graphs. In-house Google. Open-source projects available. Megastore- scalable NoSQL like system with ACID semantics but lower consistency across partitions. In-house Google. Next-gen Hadoop at Yahoo: enhanced scalability (going beyond 4000 clusters), support for multiple programming paradigms, enhanced cluster utilization.
Intelligent Web & machine learning Recommendation systems, data/web mining, natural language processing Recommendation systems: A type of collaborative filtering/information retrieval technique. Uses user profiles, ratings, browsing habits to recommend items not yet considered. First made famous in the commercial arena by Amazon.com
Amazon.com & Netflix recommendation systems
Foursquare (3/2011) and Google Places (5/2011)http://engineering.foursquare.com/2011/03/22/building-a-recommendation-engine-foursquare-style/ http://places.blogspot.com/2011/05/discover-more-places-youll-like-based.html
Hot area!Netflix and Overstock.com competitions
Search Engines (Google, Bing, Wolfram, Lucene/Nutch, etc)
Search innovations @ LinkedInhttp://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/  Uses open-source Luceneproject for social graph search and real-time indexing and searching. Dynamic filters automatically generated based on your query results!
Conclusion Big Data is a very challenging and promising area Can be used to get a competitive advantage Usually bring about advances in computer science Vast area of topics: NoSQL systems, large-scale distributed computing systems, highly scalable web system designs Machine learning techniques: search engines, recommender systems

More Related Content

What's hot

Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou, MBA, PhD
 
Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overviewBisakha Praharaj
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
Tyrone Systems
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Geoffrey Fox
 
Big data
Big dataBig data
Big data
Nandan Shah
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
Arohi Khandelwal
 
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONKNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
Connected Data World
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
Shakir Ali
 
Big Data
Big DataBig Data
Austrade Presentation - Big Data the New Oil (Microsoft draft)
Austrade Presentation - Big Data the New Oil   (Microsoft draft)Austrade Presentation - Big Data the New Oil   (Microsoft draft)
Austrade Presentation - Big Data the New Oil (Microsoft draft)Dr Andrew Seit
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -freshdatabos
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
Big Cloud
 
Keynote at the MTSR conference
Keynote at the MTSR conferenceKeynote at the MTSR conference
Keynote at the MTSR conference
Johannes Keizer
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Success stories with Connected Data
Success stories with Connected DataSuccess stories with Connected Data
Success stories with Connected Data
Connected Data World
 
Action Intelligence for Social Good
Action Intelligence for Social GoodAction Intelligence for Social Good
Action Intelligence for Social Good
Fred Chiang
 
Big query public datasets
Big query public datasetsBig query public datasets
Big query public datasets
Zdenko Hrček
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
Ganesh Sanap
 
Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
faizrashid1995
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Ivo Vachkov
 

What's hot (20)

Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overview
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big data
Big dataBig data
Big data
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONKNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Big Data
Big DataBig Data
Big Data
 
Austrade Presentation - Big Data the New Oil (Microsoft draft)
Austrade Presentation - Big Data the New Oil   (Microsoft draft)Austrade Presentation - Big Data the New Oil   (Microsoft draft)
Austrade Presentation - Big Data the New Oil (Microsoft draft)
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
 
Keynote at the MTSR conference
Keynote at the MTSR conferenceKeynote at the MTSR conference
Keynote at the MTSR conference
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Success stories with Connected Data
Success stories with Connected DataSuccess stories with Connected Data
Success stories with Connected Data
 
Action Intelligence for Social Good
Action Intelligence for Social GoodAction Intelligence for Social Good
Action Intelligence for Social Good
 
Big query public datasets
Big query public datasetsBig query public datasets
Big query public datasets
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 

Similar to 1st Birmingham Big Data Science Group meetup

Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
nkabra
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
No sql databases
No sql databasesNo sql databases
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Happiest Minds Technologies
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Sreedhar Chowdam
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
Universiti Technologi Malaysia (UTM)
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
IJERA Editor
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 

Similar to 1st Birmingham Big Data Science Group meetup (20)

Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Hadoop
HadoopHadoop
Hadoop
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 

Recently uploaded

Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 

Recently uploaded (20)

Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 

1st Birmingham Big Data Science Group meetup

  • 1. Welcome to the Birmingham Big Data Science Group (BIDS) Faizan Javed 5/25/2011 Intermark Group Sponsor: Intermark Group
  • 2. BIDS Stats Founded April 10, 2011 9 members (and counting..) Founder: Faizan Javed, Co-Founder: QasimIjaz Online presence: Meetup.com for co-ordinatingmeetups: http://www.meetup.com/bham-bids Also on (for related articles and announcements): LinkedIn: http://www.linkedin.com/groups/Birmingham-Big-Data-Science-Group-3865219 Facebook:http://www.facebook.com/home.php?sk=group_202221519811444
  • 3. Agenda What is Big Data? Quick overview of related technologies: Large-scale distributed systems and platforms NoSQL data stores Intelligent algorithms/web-mining/information retrieval techniques Highly-scalable systems
  • 4. What is Big Data? More people connected to the internet Social media explosion (Web 2.0): Facebook, Twitter, etc. Huge volumes of data being collected: sensors, mobile devices, machine-to-machine communications, social media and retail sites web logs for browsing patterns “Big” in Big Data is relative:  today's "big" is certainly tomorrow's "medium" and next week's "small.“ “Big Data" is when the size of the data itself becomes part of the problem. Going from Gigabytes to Petabytes!http://radar.oreilly.com/2010/06/what-is-data-science.html
  • 5.
  • 6. Big Data, Big Numbers McKinsey report, May 2011: http://www.mckinsey.com/mgi/publications/big_data/index.asp
  • 7. Why care about big data? Deep analysis of data can be a competitive advantage. More data  easier to find consistent patterns More data usually beats better algorithms Ex 1: Predict customer preferences and target ads on an ecommerce website. Ex 2: Improve search quality. Ex 3: Bank risk modeling (aggregate customer activity from different lines of businesses) http://blog.mikepearce.net/2010/08/18/10-hadoop-able-problems-a-summary/ http://www.ft.com/intl/cms/s/0/64095dba-7cd5-11e0-994d-00144feabdc0.html#axzz1NHn8icSC Key point: “Many different sources” & “unstructured data”
  • 8. Big Players on the Big Data Scene The Government http://us1.campaign-archive1.com/?u=4cb4c08d876d7481bbc4bc70f&id=6889126aef
  • 9. The need for new techniques Traditional “relational” techniques breakdown at scale. Solutions: NoSQL databases: Cassandra, Hbase, Riak, etc Large-scale “commodity” scale-out distributed computing techniques: MapReduce/Hadoop, Percolator, etc Analytics platforms: IBM BigInsight, EMC GreenPlum
  • 11. Prominent NoSQL database users Cassandra: Facebook, Twitter, Rackspace, Reddit, Digg.com Riak: Mozilla, Ask.com, Comcast Voldemort: LinkedIn MongoDB: Foursquare, Etsy, bit.ly, Intuit Hbase: Stumbleupon, Twitter, Infolinks, Adobe, Meetup.com,
  • 12. Hadoop-based SMAQ stackhttp://radar.oreilly.com/2010/09/the-smaq-stack-for-big-data.html public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritableval : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
  • 13. Hadoop-based SMAQ stack Hadoop comes with HDFS – Hadoop Distributed File Sytem. Can be used alongside various NoSQL systems (Hbase most common)
  • 14. Hadoop-based SMAQ stack Pig (yahoo) input = LOAD 'input/sentences.txt' USING TextLoader(); words = FOREACH input GENERATE FLATTEN(TOKENIZE($0)); grouped = GROUP words BY $0; counts = FOREACH grouped GENERATE group, COUNT(words); ordered = ORDER counts BY $0; STORE ordered INTO 'output/wordCount' USING PigStorage(); Hive (facebook) INSERT OVERWRITE TABLE xyz_com_page_views SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' AND page_views.referrer_url like '%xyz.com';
  • 15. Next-generation systems: going beyond MapReduce/Hadoophttp://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html Mostly Google and Yahoo innovations. Percolator – “real-time” MapReduce. Powers Google Instant. Dremel – superfast “Hive” to interact with large-datasets. Inhouse-Google. Pregel– highly efficient graph computing for analyzing social graphs. In-house Google. Open-source projects available. Megastore- scalable NoSQL like system with ACID semantics but lower consistency across partitions. In-house Google. Next-gen Hadoop at Yahoo: enhanced scalability (going beyond 4000 clusters), support for multiple programming paradigms, enhanced cluster utilization.
  • 16. Intelligent Web & machine learning Recommendation systems, data/web mining, natural language processing Recommendation systems: A type of collaborative filtering/information retrieval technique. Uses user profiles, ratings, browsing habits to recommend items not yet considered. First made famous in the commercial arena by Amazon.com
  • 17. Amazon.com & Netflix recommendation systems
  • 18. Foursquare (3/2011) and Google Places (5/2011)http://engineering.foursquare.com/2011/03/22/building-a-recommendation-engine-foursquare-style/ http://places.blogspot.com/2011/05/discover-more-places-youll-like-based.html
  • 19. Hot area!Netflix and Overstock.com competitions
  • 20. Search Engines (Google, Bing, Wolfram, Lucene/Nutch, etc)
  • 21. Search innovations @ LinkedInhttp://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/ Uses open-source Luceneproject for social graph search and real-time indexing and searching. Dynamic filters automatically generated based on your query results!
  • 22. Conclusion Big Data is a very challenging and promising area Can be used to get a competitive advantage Usually bring about advances in computer science Vast area of topics: NoSQL systems, large-scale distributed computing systems, highly scalable web system designs Machine learning techniques: search engines, recommender systems