Big Data refers to the large amounts of data being created from various sources such as mobile devices and social media. This data presents opportunities for personalization, prediction, and prevention through analyzing trends and correlations. To get started with Big Data, companies should focus on integrating their existing data from various sources and improving data quality before applying more advanced analytical techniques.
How Digital & Big Data Revolution Will Transform Primary Care MedicinePYA, P.C.
A recent presentation given by PYA Principals Kent Bottles, MD, and David McMillan provides food for thought when it comes to the digital transformation of primary care medicine. The pair spoke at the University of North Carolina Physicians Network on the topic “How Digital & Big Data Revolution Will Transform Primary Care Medicine.”
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 20, 2015, at the Swiss Statistical Society's celebration of the `World Statistics Day 2015' in Olten, Switzerland.
Further information are available at https://worldstatisticsday.org/blog.html?c=CHE
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 14, 2017 at Eurostat's international conference `New Techniques and Technologies for Statistics (NTTS) 2017' in Brussels, Belgium.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
The history, promise, limits, uses and applications associated with big data. A quick review provides enough knowledge to discuss the topic intelligently.
How Digital & Big Data Revolution Will Transform Primary Care MedicinePYA, P.C.
A recent presentation given by PYA Principals Kent Bottles, MD, and David McMillan provides food for thought when it comes to the digital transformation of primary care medicine. The pair spoke at the University of North Carolina Physicians Network on the topic “How Digital & Big Data Revolution Will Transform Primary Care Medicine.”
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Prof. Dr. Diego Kuonen
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on October 20, 2015, at the Swiss Statistical Society's celebration of the `World Statistics Day 2015' in Olten, Switzerland.
Further information are available at https://worldstatisticsday.org/blog.html?c=CHE
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Prof. Dr. Diego Kuonen
Keynote presentation given by Prof. Dr. Diego Kuonen, CStat PStat CSci, on March 14, 2017 at Eurostat's international conference `New Techniques and Technologies for Statistics (NTTS) 2017' in Brussels, Belgium.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
The history, promise, limits, uses and applications associated with big data. A quick review provides enough knowledge to discuss the topic intelligently.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
The impact of Big Data developments on Intellectual Property. First, what about patents? Second, a short overview of other IP rights and ownership of data in an age of Big Data.
Why is big data all the rage? What is this "data science" that people are talking about? Why do I care — as a customer, and as someone who works at a company generating data? In this talk, I present the case for models, and how we can use data science to create and use models of our customers and the society around us.
Developers rely on manipulating data to create an engaging product for users. But in the early stages of a product, there is a dearth of it, which can make the user experience dull. Then as the product ages, the amount of data increases, and can become too noisy if not properly organized. In this talk, Poornima Vijayashanker will provide some strategies for dealing with data life cycles, and how to understand what stage you're at to guide product development.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Yes, we face a data deluge and big data seems to be largely about how to deal with it. But 99% of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results.
The very real, very rapid, very great increases in data of all forms (charts showing data types and volume increases)
Challenges faced by virtually all data management programs
Means by which big data techniques can compliment existing data management practices
Necessary but insufficient pre-requisites to exploiting big data techniques
Prototyping nature of practicing big data techniques
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
Introduction to Big Data (non-technical) and the importance of Data Science to create meaning.
First of all we define Big Data in the light of the 3 Vs: volume, velocity and variety; next we move on to redefine Big Data, and we touch the topic of a data lake. We envision that Big Data will become mainstream for small organisations as well, what we can do with Big Data, how to tackle Big Data projects, what challenges lie ahead, but what opportunities are there to reap. And of course how important data science is to find the meaning in all the data.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
More Related Content
Similar to Big Data: Personalisation, Prevention, Prediction - SOCAP 2014
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
The impact of Big Data developments on Intellectual Property. First, what about patents? Second, a short overview of other IP rights and ownership of data in an age of Big Data.
Why is big data all the rage? What is this "data science" that people are talking about? Why do I care — as a customer, and as someone who works at a company generating data? In this talk, I present the case for models, and how we can use data science to create and use models of our customers and the society around us.
Developers rely on manipulating data to create an engaging product for users. But in the early stages of a product, there is a dearth of it, which can make the user experience dull. Then as the product ages, the amount of data increases, and can become too noisy if not properly organized. In this talk, Poornima Vijayashanker will provide some strategies for dealing with data life cycles, and how to understand what stage you're at to guide product development.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Yes, we face a data deluge and big data seems to be largely about how to deal with it. But 99% of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results.
The very real, very rapid, very great increases in data of all forms (charts showing data types and volume increases)
Challenges faced by virtually all data management programs
Means by which big data techniques can compliment existing data management practices
Necessary but insufficient pre-requisites to exploiting big data techniques
Prototyping nature of practicing big data techniques
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
Introduction to Big Data (non-technical) and the importance of Data Science to create meaning.
First of all we define Big Data in the light of the 3 Vs: volume, velocity and variety; next we move on to redefine Big Data, and we touch the topic of a data lake. We envision that Big Data will become mainstream for small organisations as well, what we can do with Big Data, how to tackle Big Data projects, what challenges lie ahead, but what opportunities are there to reap. And of course how important data science is to find the meaning in all the data.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
3. Housekeeping
Please... keep your phones ON and go to:
pollev.com/jmck
Feel free to put them on ‘silent’ though :-)
pollev.com/jmck
From
any
browser
3Wednesday, 20 August 14
4. Agenda
• What is Big Data?
‣ Where did it come from? Why now?
• What opportunities does it present?
‣ Personalisation, Prediction, Prevention
• How do I get started?
• NOTE: This is an opinion piece (It’s not a science!)
4Wednesday, 20 August 14
9. Why Big Data?
• 7 billion people access 6 billion mobile devices
• Last year we...
‣ Sent 11 billion texts
‣ Watched 2.8 billionYouTube videos
‣ Performed 5 billion Google searches
• The world’s data doubles every 2.1 years
9Wednesday, 20 August 14
10. Where is it coming from?
• Increased device accessibility
• New storage paradigms
• New transaction types
• Growth in social media
• Increase in use of rich media
• More conversations
10Wednesday, 20 August 14
12. The Internet Of Things
• Internet-enabled everything
• Objects predict their own failure
‣ ... and wirelessly notify their manufacturers
‣ ... who automatically pre-order parts
• Objects upgrade themselves
‣ ... such as this Mac
• Objects communicate with one another
‣ Energy companies will control demand
12Wednesday, 20 August 14
14. Why Invent ‘Big Data’?
• Big Data is about more than just ‘lots of data’
‣ ... although that’s part of it
• Big Data typically characterised by ‘3Vs’:
Volume
Variety
Velocity
14Wednesday, 20 August 14
15. Volume
• Typically measured in Petabytes
‣ A gigabyte is 7 minutes of HD video
‣ A terabyte is 120 hours of HD video (1024 Gb)
‣ A petabyte is 14 years of HD video (1024 Tb)
• Accelerating rate of growth - driven largely by mobile devices
• Prices dropping dramatically
15Wednesday, 20 August 14
16. Storage is Cheap
• Storage costs are reducing
exponentially
• Data expands to fill the space
available
• Heading fast towards the online
‘Personal Petabyte’ 0.00001
0.00010
0.00100
0.01000
0.10000
1.00000
10.00000
100.00000
1,000.00000
10,000.00000
1980 1989 1997 2006 2014
Source: PC magazine, Byte magazine, newegg.com
Storage Costs ($US/Mb)
16Wednesday, 20 August 14
17. Variety
• Traditional databases are designed for well-structured data
• Making sense of free-form text?
• Extracting information from audio?
• Searching video?
• New relationship structures between data
‣ Increasing use of network modelling
17Wednesday, 20 August 14
18. Network Modelling
• Sentiment is viral!
• Uncovers relationships of varying types
and strengths
• What are the distinct groups within your
customer networks?
• Who are the most and least connected?
• Who are the ‘influencing nodes‘ in your
customer networks?
18Wednesday, 20 August 14
20. Why Look At Networks?
• Unhappy customers vent frustrations on social networks
• Those using Twitter are already disproportionately upset
‣ Compared to those raising traditional complaints
• Twitter complaint response: 3 minutes ➔ 70 minutes
• Email complains: 24 hours (30%) ➔ Never (70%)
‣ Almost ¾ of organisations are ignoring their customers!
20Wednesday, 20 August 14
21. Viral Complaints
• The Dave Carroll band were flying with United Airlines whose
handlers damaged his guitars
• Complaints were met with rudeness, avoidance and red tape
• HisYouTube response video received 13 million hits
• Negative sentiment flooded social networks
• United Airlines’ stock dropped 10% ($180 million)
21Wednesday, 20 August 14
22. Velocity
• Driven by proliferation of mobile devices
• Twitter processes over 34,000 tweets every 60 seconds
• Amazon process approximately 20 million transactions a day
• The SKA, due for completion in 2024, will generate...
‣ 1,376 petabytes per day
‣ Twice the current daily global internet traffic!
22Wednesday, 20 August 14
23. What’s Big Data?
• ‘Traditional’ data processing technology isn’t designed for Big Data
‣ Ask Facebook, Google,Twitter, eBay, Amazon,Walmart, ...
• Big Data could be thought of as an organisational toolkit:
‣ Application of new technologies to handle the 3V’s
‣ Application of advanced statistical tools to our data
‣ Adaptation of business processes to leverage new insight
23Wednesday, 20 August 14
27. Predicting Politics
• Nate Silver
• Big Data Scientist who started by
predicting baseball results
• Famous for predicting 2008 US
election results with 98% accuracy
• Did it again 2012 with 100%
accuracy (predicted Obama 91%)
27Wednesday, 20 August 14
28. Predicting Crime
• “PredPol” predictive policing
initiative
• Los Angeles Police Department
and the University of California
• Software predicts where crime
will occur within a given area
• Based on analysis of 13 million
crimes over the last 80 years
28Wednesday, 20 August 14
29. Predicting Crime
• Mathematical model originally
determined the location of
earthquake aftershocks
• Crime prediction model is
updated with new crime data in
real time to improve accuracy
• Result: 12% decrease in property
crime, 28% decrease in burglary
29Wednesday, 20 August 14
30. NSW Police 3rd Eye Cameras
• Sydney police getting vest-
mounted cameras
• The Wolfcom units record
- 6 hours of HD video
- 20K 12 megapixel images
- 500 hrs voice recording
- All GPS tagged
• How is this used?
30Wednesday, 20 August 14
31. Target
• Minneapolis father furious at ‘offensive’ marketing to his daughter
• ... due to Andrew Pole, Big Data specialist at Target
• Andrew identified about 25 products that, together, allowed him to
assign each online user a ‘pregnancy prediction’ score
• ... which can also estimate the due date to within a few days!
• Target uses this to send coupons timed to very specific stages of
pregnancy
31Wednesday, 20 August 14
33. Holistic Data
• Traditional approaches used data sampling due to data volumes
‣ Take every n’th record
‣ Take selected records (e.g. geographical or other segments)
• Sampling is often biased
‣ Statistical aberrations
‣ Simpson’s paradox
33Wednesday, 20 August 14
34. Unstructured Data
• Incorporate unstructured data into your analysis
‣ Twitter, Facebook, Social Media
‣ Emails, Contact notes
‣ Audio, Pictures,Videos
‣ Networks
• Distill these and use them to feed analytic models
34Wednesday, 20 August 14
35. Correlation over Causation
• Traditional analysis involves testing hypotheses against our data
‣ Requiring a hypothesis
‣ Root cause analysis based on guessing reasons for behaviour
• Holistic data opens the door to a new approach
‣ Focus on influencing factors rather than possible causes
‣ Root cause analysis based on statistical probability
35Wednesday, 20 August 14
40. Big Data for Complaints
• Anticipate complaints
‣ Based on statistical probability and our customer insight
• Identify the root cause of complaints
‣ Link complaints to business processes and organisational change
• Proactively engage customers in high quality conversations
‣ So poor conversations don’t escalate into complaints
40Wednesday, 20 August 14
41. The Opportunities
• Derive insight from customer behaviour
• Analytic probabilities rather than traditional signals
‣ Correlation over causation
• Augment our data with 3rd party intelligence
• Derive insight from non-traditional (unstructured) sources
41Wednesday, 20 August 14
42. How Do I Get Started
With Big Data?
42Wednesday, 20 August 14
43. Statistically Probable Starting Point
• Big Data is not a panacea!
• Big Data will not fix your data quality issues
‣ Customer insight requires a single customer!
• Start by assessing your current information architecture
• Data Integration
43Wednesday, 20 August 14
47. Getting Started
• Focus on...
‣ Data Integration
‣ Data Quality
‣ Master Data Management
• Big Data can help these initiatives
‣ ...but you need to reach a minimum threshold before you start
47Wednesday, 20 August 14
49. What can Big Data do for me?
• You don’t need a SKA or 1.23 billion users (like Facebook) to
benefit from the approaches adopted by the Big Data organisations
• The Big Data toolkit incorporates...
• Technology adoption
• Statistical modelling
• Business change
49Wednesday, 20 August 14
50. What can Big Data do for me?
• Understand your customers
• Customer segmentation to a segment of one - The Customer
• Anticipate their needs
• ... and hence their behaviour
• Drive high quality conversations with them
• Based on your understanding of them
50Wednesday, 20 August 14
51. Big Data Technologies
• Business intelligence
• Visualisation
• Infrastructure
• Agile methodologies
• New data storage
architectures
• Parallel processing
• Machine learning
• Statistical modelling
51Wednesday, 20 August 14
52. Approaches Opportunities
Big Data Summary
Characteristics
• Velocity • Incorporating
unstructured data
into your analysis
• Holistic data rather
than sampling
• Correlation rather
than causation
• Complaints root
cause analysis
• Volume
• Improve the quality
of conversations
• Anticipate
behaviour through
deep understanding
• Variety
52Wednesday, 20 August 14
53. Privacy - Social Media
• Legislation will always trail technology
• Social Media sites most frequently have “a worldwide, non-
exclusive, royalty-free license, with the right to sublicense”
• You’ve never paid Facebook or Twitter a cent
• They can ‘monetize’ both your content and your metadata
• Most legislation centres around self-policing and opt-out
53Wednesday, 20 August 14
54. Privacy - Individuals’ Rights
• Staples (US) operate a punitive pricing model
‣ Your IP address tells them if you live in an
expensive neighbourhood
• OfficeMax addressed marketing to
”Mike Seay, Daughter Killed in Car Crash”
• Washington D.C. Police office convicted after
looking up licence plates of vehicles near a
gay bar and blackmailing the vehicle’s owners
54Wednesday, 20 August 14
55. Privacy - I can buy your...
• Full name, spouse, children, ex-partners, co-habitees, current
address, previous addresses, ownership status, purchase date and
price, outstanding mortgage debt
• Job type, income band, credit score, credit and store cards, spending
habits, charitable contributions, family events (births, deaths) and
likely political affiliation
• Ethnicity, primary language, and (in the US) health information!
‣ Cancer, diabetes and clinical depression lists with credit score
55Wednesday, 20 August 14