Create Custom Analytics inYour BrowserPyData 2013Ville TuulosCEO, Co-Founder
Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement their...
Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement their...
Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement their...
Python is great
Python is greatMapReduce is hard
Python is greatMapReduce is hardServers are annoying (cloud or not)
Python is greatMapReduce is hardServers are annoying (cloud or not)Everybody likes real-time
Python is greatMapReduce is hardServers are annoying (cloud or not)Everybody likes real-timeSupport healthy workflows
Demo
what makes some users very active?Customer CCustomer Bhow to reduce churn?Customer Awhy some users return?Daily ActivityDa...
Simple ComplexDiscoverExplore
Simple ComplexDiscoverExploreInfographicsBasic StatisticsReports
Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsSegmentsFunnelsVisualizations
Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsQuerySegmentsFunnelsSlice & DiceDescriptive ModelsVisualiz...
Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsQuerySegmentsFunnelsClusteringSlice & DiceDescriptive Mode...
DiscoDBpersistent, immutable, compressed, lightning fast,key-value(s) mappingthat supports lazy boolean queries.Codehttps:...
from discodb import DiscoDBFILES = [‘a.txt’,‘b.txt’,‘c.txt’]def extract_words():for fname in FILES:for word in open(fname)...
Hash Map:hash(Key) → Key IDValue Map:Key ID → [Value ID, ...]Keys:Key ID → KeyValues:Value ID →ValueDiscoDB Chunk
Hash Map:hash(Key) → Key IDValue Map:Key ID → [Value ID, ...]Keys:Key ID → KeyValues:Value ID →ValueDiscoDB ChunkPerfect h...
DiscoDB ChunkNode 1 Node 2 Node NDisco NodePython WorkerDDFSDisco NodePython WorkerDisco NodePython WorkerDiscoDB ChunkDis...
A → [Apple, Orange, Banana]B → [Apple, Banana]C → [Banana, Melon]Q(“A & B”)AppleBananaQ(“A | B”)AppleOrangeBananaQ(“(A & B...
Model:Event → UsersQuery (sequence of events):Q(“Event A & Event B & ...”)Funnelhttps://github.com/tuulos/bd3-mixpanel-fun...
Model:Day N → UsersQuery (weekly cohorts):Q(“(dayN | dayN+1) & (dayM | dayM+1...)”)Cohort Analysishttps://github.com/tuulo...
Model:Day N → UsersQuery (one time series):[Q(Day K) for K in range(start, end)]Time Serieshttps://github.com/tuulos/bd3-m...
ThankYou!TRENDINGTRENDINGTRENDINGTRENDINGhttps://bitdeli.com/freeInterested?Contact ville@bitdeli.comFree analytics for yo...
Bitdeli - A Platform for Creating Custom Analytics in Your Browser (PyData SV 2013)
Bitdeli - A Platform for Creating Custom Analytics in Your Browser (PyData SV 2013)
Upcoming SlideShare
Loading in …5
×

Bitdeli - A Platform for Creating Custom Analytics in Your Browser (PyData SV 2013)

749 views
622 views

Published on

Video can be found here: https://vimeo.com/63298686

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
749
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Bitdeli - A Platform for Creating Custom Analytics in Your Browser (PyData SV 2013)

  1. 1. Create Custom Analytics inYour BrowserPyData 2013Ville TuulosCEO, Co-Founder
  2. 2. Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement theirown infrastructure
  3. 3. Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement theirown infrastructureDisco
  4. 4. Everybody (Click & Play)Business Analysts (Excel)IT / DBAs (SQL, Python)Data Hackers (MapReduce)People who implement theirown infrastructure
  5. 5. Python is great
  6. 6. Python is greatMapReduce is hard
  7. 7. Python is greatMapReduce is hardServers are annoying (cloud or not)
  8. 8. Python is greatMapReduce is hardServers are annoying (cloud or not)Everybody likes real-time
  9. 9. Python is greatMapReduce is hardServers are annoying (cloud or not)Everybody likes real-timeSupport healthy workflows
  10. 10. Demo
  11. 11. what makes some users very active?Customer CCustomer Bhow to reduce churn?Customer Awhy some users return?Daily ActivityDaily Activity Daily ActivityUsersUsersUsers
  12. 12. Simple ComplexDiscoverExplore
  13. 13. Simple ComplexDiscoverExploreInfographicsBasic StatisticsReports
  14. 14. Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsSegmentsFunnelsVisualizations
  15. 15. Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsQuerySegmentsFunnelsSlice & DiceDescriptive ModelsVisualizations
  16. 16. Simple ComplexDiscoverExploreInfographicsBasic StatisticsReportsQuerySegmentsFunnelsClusteringSlice & DiceDescriptive ModelsVisualizationsPredictive Models
  17. 17. DiscoDBpersistent, immutable, compressed, lightning fast,key-value(s) mappingthat supports lazy boolean queries.Codehttps://github.com/discoproject/discodbDocshttp://discoproject.org/doc/discodb/
  18. 18. from discodb import DiscoDBFILES = [‘a.txt’,‘b.txt’,‘c.txt’]def extract_words():for fname in FILES:for word in open(fname).read().split():yield word, fnamedb = DiscoDB(extract_words())db[‘dog’]db.keys()db.unique_values()db.items()# files that mention ‘dog’# all distinct word# all distinct filenames# all (word, iter(fname)) pairs
  19. 19. Hash Map:hash(Key) → Key IDValue Map:Key ID → [Value ID, ...]Keys:Key ID → KeyValues:Value ID →ValueDiscoDB Chunk
  20. 20. Hash Map:hash(Key) → Key IDValue Map:Key ID → [Value ID, ...]Keys:Key ID → KeyValues:Value ID →ValueDiscoDB ChunkPerfect hashing by CMPH,guaranteed O(1)The list of Value IDsis delta-encodedValues are compressedwith a global Huffmancodebook
  21. 21. DiscoDB ChunkNode 1 Node 2 Node NDisco NodePython WorkerDDFSDisco NodePython WorkerDisco NodePython WorkerDiscoDB ChunkDiscoDB ChunkDiscoDB ChunkDiscoDB ChunkDiscoDB ChunkDiscoDB ChunkDiscoDB ChunkDiscoDB Chunk
  22. 22. A → [Apple, Orange, Banana]B → [Apple, Banana]C → [Banana, Melon]Q(“A & B”)AppleBananaQ(“A | B”)AppleOrangeBananaQ(“(A & B) | C”)BananaDiscoDBfrom discodb.query import QQuerying with Conjunctive Normal Form
  23. 23. Model:Event → UsersQuery (sequence of events):Q(“Event A & Event B & ...”)Funnelhttps://github.com/tuulos/bd3-mixpanel-funnel
  24. 24. Model:Day N → UsersQuery (weekly cohorts):Q(“(dayN | dayN+1) & (dayM | dayM+1...)”)Cohort Analysishttps://github.com/tuulos/bd3-mixpanel-cohort
  25. 25. Model:Day N → UsersQuery (one time series):[Q(Day K) for K in range(start, end)]Time Serieshttps://github.com/tuulos/bd3-mixpanel-trends
  26. 26. ThankYou!TRENDINGTRENDINGTRENDINGTRENDINGhttps://bitdeli.com/freeInterested?Contact ville@bitdeli.comFree analytics for your GitHub repos:

×