1
Analytical
maturity
Problem
statement
Science Engineering
Stay lean and focus on value
2
4b
16
5
Countrie
s
25m
Users
$3b
n
Raised
10
Currencie
s
6
Analytical
maturity
Problem
statement
Stay lean and focus on value
Team Size
2
No good cause
should go unfunded
8
User engagement,
frequency of visits
Prediction of user journeys
and route to discovery
1) Get users to visit more frequently
2) Discover causes to give to
3) Give more frequently over a 12
month period
By using data to build an engaging
and personal experience
9
10
Analytical
maturity
Problem
statement
Stay lean and focus on value
Team Size
5
Science
A recommendation engine to suggest
content
We needed personalisation to work
out what you care about
Traditional methods
don’t work in this space
Lots of research and
working with academics
Its about social
relationships and
networks
The answer was staring us in
the face every day
We can run calculations over
these networks
87million
nodes
420million
relationships
So this is what we planned to build
14 years of giving behaviour, online, web traffic, academic research
Engagin
g
Machine
Learning
Social
Graph
Theory
Personal
23
Analytical
maturity
Problem
statement
Stay lean and focus on value
Team Size
10
Science Engineering
To achieve our vision we built
an intelligent machine that…
Give
Care about
Engaging content
Building a real-time graph
is hard!
Microsoft Azure
SQL Database
Importer
Service
Service BusBlob Storage
Website
Redis Cache
Table Storage WebsiteHDInsight F# Mailbox
28
Analytical
maturity
Problem
statement
Science Engineering
Stay lean and focus on value
No good cause
should go unfunded

HOW TO BECOME AN EFFECTIVE DATA SCIENTIST (WORKSHOP) - MIKE BUGEMBE

Editor's Notes

  • #2 .
  • #3 .
  • #4 This is what I shared with you last year/now progress etc
  • #6 This is what has genereated that 3 billion donations…….. You can see there is a fundraising story, an event that they are doing, donations from a friends….so much information
  • #7 .
  • #9 .
  • #11 .
  • #16 All of this stuff flows through networks
  • #18 This means we have can now use big data To do graph calculations We call these betweeness and centrality calculations They tell us strengths of relationships Nature of relationships Influencers, Getting the word around quickly Important nodes The givegraph is important allows us to understand how generocity flows through the network. It allows us to work out relative strength of what you care about. Now without big data, some of these calculations were impossible in the past, if you i
  • #23 An engaging product – the feed, with notifications like may engaging products out there supported by machine learning for personal content to appear on that, and the graph to bring social connections into the picture…
  • #24 .
  • #25 We built a machine that could work out how you give, (its not about money, some like to give time, energy), the algorithms can work out what you care about at that time, remember what you care about is not static, it changes with circumstances and associations but most importantly, it not only knows how to keep you engaged, it has been trained to interact and engage with you. All of this makes up the give graph, a machine level understanding of the world of giving and how to operate in it.
  • #26 Which you can see here. This is the product that uses and consumes all of the intelligence that is presented. Changing giving from a transactional action to a more engaging social action, enabling people to give, removing the barriers that we have created for giving. As I finish of just now, I really cannot emphasise enough how impactful this is going to be. And one of the main reasons why we are in a position to do this is our vast amounts of data collated over the years and because of the ease of use and flexibility of the tools available on the Azure platform that really enabled us to focus on the task of ensuring that every great cause gets the funding that it requires. Thank you.
  • #27 Our team is made up of scientists and engineers and we needed to address the problem fast without access to a large operations teams that specialize in managing a Hadoop infrastructure. With Azure we really enjoyed the benefits of the platform as a service. It means our scientists and engineers can focus on solving the challenges that I outlined earlier.
  • #28 These are the components that we used. To make these algorithms real we use a combination of batch and real time processing, the batch is the back bone of this process and at the center of this is HdInsight. Our TS data is extracted from our on premise SQL servers and uploaded to Azure blob storage and the then we use map reduce to build the graph and do the calculations, Most of our existing Jobs are Java Map reduce but we are actively looking at Spark for the machine learning, this will allow us to use python and scala. This is kicked off using a job schedule and the orchestration service which spin up a cluster and schedules the map reduce jobs. Once the jobs are finished the cluster is destroyed and the data is imported to table storage ready for consumption. For Real time, you can see from the diagram that we combine F# mailboxes with Azure service bus. All events that take place on the platform are streamed through or on to the service bus. Allowing us to give the user an interactive experience. The results are merged with the batch results in table storage To present the data we use Azure websites to host our API’s again this is where the platform as a service enables us to have a managed service and elastic scalability out of the box. We run Azure distributed cache to allow us to present the data that is persisted in Azure storage and simultaneously manage any spikes in traffic. All of this manifests its self on the product as a feed.
  • #29 .
  • #30 Ensure that every great cause gets the funding that it deserves. Our vision now this is pretty challenging, there are billions of people in the world who have the ability to give to a cause that they care about and if they all did we could eradicate poverty, we could put cancer out of business. But why are they not giving and how is technology going to help solve that problem. It turns out that a few years a go we found that we could address this problem with machine learning and big data. We had to use our data to address the following challenges