SlideShare a Scribd company logo
Big Data
University of Kent
23rd April 2015
@DigContactLtd
Discussionitems
• Who are Digital Contact?
• Problems with Big Data
• Hadoop
• Word2Vec
• (More) Problems with Big Data
• Election debates
WhoareDigitalContact?
• We are a big data product company
• Focus on developing products and services
for business-to-business and business-to-
consumers
• Currently developing trading.co.uk
Problemswithbigdata
• Often described as the three V’s:
1. Volume – Huge quantities of data available
2. Velocity – Data constantly produced by both people and
3. Variety – Data can be both structured and un-structured
• How can we tackle some of these problems?
Hadoop
• Hadoop is an open-source software framework
• Developed at Yahoo to deal with ever-increasing
amounts of content
• It allows you to store and process data in a distributed
fashion (ie over a number of machines)
• This allows for 2 key things: massive data storage and
faster processing
• It’s an incredibly powerful system but, as it’s relatively
new, there is little documentation on it
• Used by Amazon, Ebay, Facebook, LinkedIn and many
more
Hadoop–DataStorage
• Hadoop allows for huge data files to be stored across
multiple machines
• Takes files and breaks them into blocks (normally
64/128mb)
• Blocks are stored in data nodes and are typically
replicated across 3 nodes per block
• A master node maintains the location of the blocks and
which file they belong to – however, it doesn’t store the
blocks itself
Hadoop–datastorage
Hadoop–datastorage
• Allows for complete redundancy – data nodes are easily replacable
• Allows for faster access to the data – system can request data from 3 places and use the fastest return
• Storage is reduced to 1/3 capacity but:
• Files can be read in a compressed format
• Redundancy is worth the cost
• Higher failure rates permissible for data nodes
• Storage is cheap!
Hadoop–dataprocessing
• Once the data’s in, how is it processed?
• One major component of Hadoop is MapReduce
• Doesn’t try and process everything all at once
• Instead, processes chunks of data and tallies up results
Hadoop–dataprocessing
Hadoop–dataprocessing
• Designed for massive data sets
• Not suitable for processing small sets quickly (although other tools on Hadoop can do this
in real-time)
• Allows users to stream data through other programming languages
• During most recent debate, able to extract named entities and sentiment from 10,000,000
tweets in 3:30 minutes! (more on this later)
Workingwithdata
• Hadoop can help with volume and velocity of data – what about
variety
• Need methods to add structure to unstructured data
• For working with text, we’ve been looking at Word2Vec
Word2Vec
• Developed and released as an open source project by Google
• Described as a ‘really, really big deal’ by the head of Kaggle (a data science
competition website)
• Works by representing every word as a vector (a series of numbers for each word
showing how likely it is to be found in relation to other words)
• Trains by taking a word and working out how likely other words are to come
before and after it
• It’s maths with words
• Allows you to do some really interesting stuff…
Word2Vecuses
>>> model.doesnt_match("man woman child kitchen".split())
‘kitchen’
>>>model.most_similar("awful")
(u'terrible', 0.6721246242523193),
(u'horrible', 0.6031243205070496),
(u'dreadful', 0.5896061658859253),
(u'atrocious', 0.5460706949234009),
(u'laughable', 0.5287274122238159),
(u'horrendous', 0.521348237991333),
(u'abysmal', 0.5080942511558533),
(u'appalling', 0.4996950328350067),
(u'amateurish', 0.4995490610599518),
(u'lousy', 0.49693402647972107)
Word2Vecuses
• Works well as a thesaurus
• Able to look for similar words and find odd ones out
• Useful to overcome issues around synonymy
• Even more helpful is that it models relationships between words
• We can see this when we model the words on a 2d space
Word2Vecuses
• Related words have similar
relationships:
Word2Vecuses
• Paths between related words are also consistent:
Word2Vecuses
• Can generate useful results:
Word2Vecuses
We can also add and subtract words for more information:
• King + Woman – Man = Queen
• London + France – England = Paris
• Bigger – Big + Cold = Colder
• Sushi – Japan + Germany = Bratwurst
• Cu – Copper + Gold = Au
• Windows – Microsoft + Google = Android
• Tim Cook – Apple + Microsoft = Satya Nadella
Word2Vecuses
• My personal favourite:
Word2Vecuses
• My personal favourite:
Word2Vecuses
Wide range of applications for this model:
• Answering queries
• Understanding meaning of new words
• Easy to understand results
• Good for finding similar documents in a large corpus
• Intelligent localised searches
• Machine Translation
• Detecting sarcasm
• Sentiment analysis
• Pub quizzes…
(More)Problemswithbigdata
• More V’s for data science to deal with:
1. Veracity – Data contains noise – need to keep data ‘clean’
2. Validity – Data needs to be correct and fit for purpose
3. Volatility – Data needs to be relevant to the analysis
4. Viewership – Results need to be appropriate to the audience
• Quick case study
Leaders’Debates
• Over 10,000,000 election tweets
• Looked for mentions of parties or leaders
• Analysed tweets for sentiment
• Gave interesting insights into debates
Firstdebate
• Social Media mentions by minute:
Firstdebate
• SNP mentions climbed steadily:
Firstdebate
• SNP fared better overall and leader out-performed party:
Leaders’Debates
• Data was processed with Hadoop within 5 minutes of debate being finished
• Analysed 10,000,000 tweets and extracted relevant information
• Able to provide a clear picture of social media
• Interesting result in second debate…
Seconddebate
• Guess when Nigel Farage criticised the audience:
FinalPoints
• Huge number of tools and methods for dealing with Big Data
• Good idea to work out what you want to find
• Is your data big? Can it be made bigger?
• Are your results useful? Can they be improved?
• Have fun!
Questions
Twitter: @DigContactLtd
Email: marketing@digitalcontact.co.uk

More Related Content

What's hot

Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Natalino Busa
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
Bernard Marr
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
Rich Heimann
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
N.Jagadish Kumar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
Natalino Busa
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
Putchong Uthayopas
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Jen Stirrup
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
TJ Stalcup
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
fazail amin
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Rich Heimann
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
Klawal13
 
BIG DATA RESEARCH
BIG DATA RESEARCHBIG DATA RESEARCH
BIG DATA RESEARCH
Kathirvel Ayyaswamy
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
Jen Stirrup
 
Big data technology
Big data technology Big data technology
Big data technology
omer mohamed abd alrhman
 
DS4G
DS4GDS4G
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7

What's hot (20)

Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
BIG DATA RESEARCH
BIG DATA RESEARCHBIG DATA RESEARCH
BIG DATA RESEARCH
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
 
Big data technology
Big data technology Big data technology
Big data technology
 
DS4G
DS4GDS4G
DS4G
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 

Similar to Digital Contact's big data presentation to the University of Kent

POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
Lynne Thomas
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
Takrim Ul Islam Laskar
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
NidhiAhuja30
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data And Hadoop
Big Data And HadoopBig Data And Hadoop
Big Data And Hadoop
Ankur Tripathi
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
Hagar Alaa el-din
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
Sharjeel Imtiaz
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
IMC Institute
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Sri Kanth
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Karan Desai
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
Davide Mauri
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 

Similar to Digital Contact's big data presentation to the University of Kent (20)

POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data And Hadoop
Big Data And HadoopBig Data And Hadoop
Big Data And Hadoop
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 

Recently uploaded

Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 

Recently uploaded (20)

Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 

Digital Contact's big data presentation to the University of Kent

  • 1. Big Data University of Kent 23rd April 2015 @DigContactLtd
  • 2. Discussionitems • Who are Digital Contact? • Problems with Big Data • Hadoop • Word2Vec • (More) Problems with Big Data • Election debates
  • 3. WhoareDigitalContact? • We are a big data product company • Focus on developing products and services for business-to-business and business-to- consumers • Currently developing trading.co.uk
  • 4. Problemswithbigdata • Often described as the three V’s: 1. Volume – Huge quantities of data available 2. Velocity – Data constantly produced by both people and 3. Variety – Data can be both structured and un-structured • How can we tackle some of these problems?
  • 5.
  • 6. Hadoop • Hadoop is an open-source software framework • Developed at Yahoo to deal with ever-increasing amounts of content • It allows you to store and process data in a distributed fashion (ie over a number of machines) • This allows for 2 key things: massive data storage and faster processing • It’s an incredibly powerful system but, as it’s relatively new, there is little documentation on it • Used by Amazon, Ebay, Facebook, LinkedIn and many more
  • 7. Hadoop–DataStorage • Hadoop allows for huge data files to be stored across multiple machines • Takes files and breaks them into blocks (normally 64/128mb) • Blocks are stored in data nodes and are typically replicated across 3 nodes per block • A master node maintains the location of the blocks and which file they belong to – however, it doesn’t store the blocks itself
  • 9. Hadoop–datastorage • Allows for complete redundancy – data nodes are easily replacable • Allows for faster access to the data – system can request data from 3 places and use the fastest return • Storage is reduced to 1/3 capacity but: • Files can be read in a compressed format • Redundancy is worth the cost • Higher failure rates permissible for data nodes • Storage is cheap!
  • 10. Hadoop–dataprocessing • Once the data’s in, how is it processed? • One major component of Hadoop is MapReduce • Doesn’t try and process everything all at once • Instead, processes chunks of data and tallies up results
  • 12. Hadoop–dataprocessing • Designed for massive data sets • Not suitable for processing small sets quickly (although other tools on Hadoop can do this in real-time) • Allows users to stream data through other programming languages • During most recent debate, able to extract named entities and sentiment from 10,000,000 tweets in 3:30 minutes! (more on this later)
  • 13. Workingwithdata • Hadoop can help with volume and velocity of data – what about variety • Need methods to add structure to unstructured data • For working with text, we’ve been looking at Word2Vec
  • 14. Word2Vec • Developed and released as an open source project by Google • Described as a ‘really, really big deal’ by the head of Kaggle (a data science competition website) • Works by representing every word as a vector (a series of numbers for each word showing how likely it is to be found in relation to other words) • Trains by taking a word and working out how likely other words are to come before and after it • It’s maths with words • Allows you to do some really interesting stuff…
  • 15. Word2Vecuses >>> model.doesnt_match("man woman child kitchen".split()) ‘kitchen’ >>>model.most_similar("awful") (u'terrible', 0.6721246242523193), (u'horrible', 0.6031243205070496), (u'dreadful', 0.5896061658859253), (u'atrocious', 0.5460706949234009), (u'laughable', 0.5287274122238159), (u'horrendous', 0.521348237991333), (u'abysmal', 0.5080942511558533), (u'appalling', 0.4996950328350067), (u'amateurish', 0.4995490610599518), (u'lousy', 0.49693402647972107)
  • 16. Word2Vecuses • Works well as a thesaurus • Able to look for similar words and find odd ones out • Useful to overcome issues around synonymy • Even more helpful is that it models relationships between words • We can see this when we model the words on a 2d space
  • 17. Word2Vecuses • Related words have similar relationships:
  • 18. Word2Vecuses • Paths between related words are also consistent:
  • 19. Word2Vecuses • Can generate useful results:
  • 20. Word2Vecuses We can also add and subtract words for more information: • King + Woman – Man = Queen • London + France – England = Paris • Bigger – Big + Cold = Colder • Sushi – Japan + Germany = Bratwurst • Cu – Copper + Gold = Au • Windows – Microsoft + Google = Android • Tim Cook – Apple + Microsoft = Satya Nadella
  • 23. Word2Vecuses Wide range of applications for this model: • Answering queries • Understanding meaning of new words • Easy to understand results • Good for finding similar documents in a large corpus • Intelligent localised searches • Machine Translation • Detecting sarcasm • Sentiment analysis • Pub quizzes…
  • 24. (More)Problemswithbigdata • More V’s for data science to deal with: 1. Veracity – Data contains noise – need to keep data ‘clean’ 2. Validity – Data needs to be correct and fit for purpose 3. Volatility – Data needs to be relevant to the analysis 4. Viewership – Results need to be appropriate to the audience • Quick case study
  • 25. Leaders’Debates • Over 10,000,000 election tweets • Looked for mentions of parties or leaders • Analysed tweets for sentiment • Gave interesting insights into debates
  • 26. Firstdebate • Social Media mentions by minute:
  • 27. Firstdebate • SNP mentions climbed steadily:
  • 28. Firstdebate • SNP fared better overall and leader out-performed party:
  • 29. Leaders’Debates • Data was processed with Hadoop within 5 minutes of debate being finished • Analysed 10,000,000 tweets and extracted relevant information • Able to provide a clear picture of social media • Interesting result in second debate…
  • 30. Seconddebate • Guess when Nigel Farage criticised the audience:
  • 31. FinalPoints • Huge number of tools and methods for dealing with Big Data • Good idea to work out what you want to find • Is your data big? Can it be made bigger? • Are your results useful? Can they be improved? • Have fun!