SlideShare a Scribd company logo
1 of 83
Short URLs, Big Fun:
Understanding the World in Realtime


            Hilary Mason
         Chief Scientist, bitly

              @hmason
              h@bit.ly
http://www.pcworld.com/article/223409/move_over_dr_soong_
girls_can_build_android_apps_too.html




                        http://bit.ly/hOnbWg
[fireplace]
How do we change the world?
Can we understand the world, first?
Big Data
Data
10s of millions of URLs per day
100s of millions of clicks per day



10s of billions of URLs
encodes
{"g": "zalAU0",
 "i": "173.213.X.X",
"h": "zalAU0",
"l": "bitly",
"u": "http://www.amazon.com/Country-Life-Cal-Mag-
Potassium-Target-
Tablets/dp/B0001VUZ3A?SubscriptionId=AKIAJGA7
AAB6QE7WENSQ&tag=mycellrevi-
20&linkCode=sp1&camp=2025&creative=165953&cr
eativeASIN=B0001VUZ3A", "t": 1328266799,
"_id": "4f2bbe2f-0035d-063a1-3d1cf10a"}
decode
{"a": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"c": "US",
"nk": 1,
"tz": "America/New_York",
"gr": "NY",
"g": "xNaZ9h",
"i": "98.118.X.X",
"h": "wXxuKW",
"k": "4eefe4be-003e4-X-X",
"l": "moma",
"al": "en-US", "hh":
"bit.ly",
"r":
"http://www.facebook.com/l.php?u=http%3A%2F%2Fbit.ly%2FwXxuKW&h=rAQG_ZZ2G
AQGQth1IOej-9_KHmbpEvh6FlllZjsAqg6A7Rw",
"u": "http://www.brainpickings.org/index.php/2012/02/02/jackson-pollock-father-letter/",
"t": 1328272481,
"hc": 1328232072,
"cy": "East Amherst",
"ll": [43.044101715087891, -78.694900512695312]}
a link
•   URL
•   Content
•   Ref distribution
•   Geo distribution
•   Language
•   Key phrases
•   Topic
Data Science?




Analytics   Science
Data Science?




Things you can                   Things you
just count.                      can’t.
Data scientists?

     engineering
                                math

                     nerds


           nerds               nerds



                     nerds
comp sci
                             hacking




                   awesome nerds
bitly science team!
What can we learn from a lot of
 people talking to each other?
A few things that we can count...
How do people use different
        devices?
What happens on the internet when
       society isn’t stable?
Revolution.
(Silly Things on the Internet)
the cutest kitten
A few things that we can count...

            cleverly.
What spoken languages are in a
           page?
raw data
"es"
"en-us,en;q=0.5"
"pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4"
"en-gb,en;q=0.5"
"en-US,en;q=0.5"
"es-es,es;q=0.8,en-us;q=0.5,en;q=0.3”
"de, en-gb;q=0.9, en;q=0.8"
entropy calculation
def ghash2lang(g, Ri, min_count=3, max_entropy=0.2):
  ""”
  returns the majority vote of a langauge for a given hash
  ""”
  lang = R.zrevrange(g,0,0)[0]
  # let's calculate the entropy!
  # possible languages
  x = R.zrange(g,0,-1)
  # distribution over those languages
  p = np.array([R.zscore(g,langi) for langi in x])
  p /= p.sum()
  # info content
  I = [pi*np.log(pi) for pi in p]
  # entropy: smaller the more certain we are! - i.e. the lower our surprise
  H = -sum(I)/len(I) #in nats!
  # note that this will give a perfect zero for a single count in one language
  # or for 5K counts in one language. So we also need the count..
               count = R.zscore(g,lang)
  if count < min_count and H > max_entropy:
      return lang, count
  else:
      return None, 1
http://4sq.com/96kc1O
What’s the context
 around a link?
[demo]
Things we have to think about.

           (science)
What’s a human?
normal click distributions
abnormal click distributions
Organic vs Inorganic?
AT SCALE
1. Research offline

2. Do fancy math – find the shortcuts

3. Design infrastructure

4. Re-design to run at scale and speed
Realtime Search
Realtime Search
Attributes calculated either at index time or
query time.

Rankings can vary by second.
[demo]
What are people paying
attention to right now?
actual rate of clicks on phrases
vs
expected rate of clicks on phrases
Dragoneye
We calculate clickrate with a sort of moving
average:




          where
Dragoneye
We represent as a sum of delta spikes.

This simplifies to:
Dragoneye
Choosing    is important.

It must be interpretable, and smooth (but not
too smooth).

We use a distribution for that is a function
that sums to 1. The function is 0 at the
origin.
[demo]
philosophy
simple math > fancy math
How do we know
when we’ve won?
How do we communicate what
  we’ve learned effectively?
Ask the crazy questions.
Thank you!




h@bit.ly
@hmason

More Related Content

Similar to Short URLs, Big Fun

Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for developmentSara-Jayne Terp
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internettkisason
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Understanding Artificial Intelligence
Understanding Artificial Intelligence Understanding Artificial Intelligence
Understanding Artificial Intelligence St. Petersburg College
 
October hug
October hugOctober hug
October hughuguk
 
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersPython 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersSai Linn Thu
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinChristopher Currin
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web searchVictor de Boer
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with pythonJan Kroon
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 

Similar to Short URLs, Big Fun (20)

Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Progressing and enhancing
Progressing and enhancingProgressing and enhancing
Progressing and enhancing
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Understanding Artificial Intelligence
Understanding Artificial Intelligence Understanding Artificial Intelligence
Understanding Artificial Intelligence
 
October hug
October hugOctober hug
October hug
 
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersPython 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute Beginners
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher Currin
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with python
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 

More from Hilary Mason

PyCon 2011 Keynote
PyCon 2011 KeynotePyCon 2011 Keynote
PyCon 2011 KeynoteHilary Mason
 
Machine Learning for Web Data
Machine Learning for Web DataMachine Learning for Web Data
Machine Learning for Web DataHilary Mason
 
A Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime WebA Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime WebHilary Mason
 
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell ScriptIgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell ScriptHilary Mason
 
Practical Data Analysis in Python
Practical Data Analysis in PythonPractical Data Analysis in Python
Practical Data Analysis in PythonHilary Mason
 
Have data? What now?!
Have data? What now?!Have data? What now?!
Have data? What now?!Hilary Mason
 
JWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAXJWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAXHilary Mason
 
Analytics for Virtual Worlds
Analytics for Virtual WorldsAnalytics for Virtual Worlds
Analytics for Virtual WorldsHilary Mason
 
Experiential Learning in Second Life
Experiential Learning in Second LifeExperiential Learning in Second Life
Experiential Learning in Second LifeHilary Mason
 
Virtual Worlds in Education
Virtual Worlds in EducationVirtual Worlds in Education
Virtual Worlds in EducationHilary Mason
 

More from Hilary Mason (10)

PyCon 2011 Keynote
PyCon 2011 KeynotePyCon 2011 Keynote
PyCon 2011 Keynote
 
Machine Learning for Web Data
Machine Learning for Web DataMachine Learning for Web Data
Machine Learning for Web Data
 
A Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime WebA Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime Web
 
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell ScriptIgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
 
Practical Data Analysis in Python
Practical Data Analysis in PythonPractical Data Analysis in Python
Practical Data Analysis in Python
 
Have data? What now?!
Have data? What now?!Have data? What now?!
Have data? What now?!
 
JWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAXJWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAX
 
Analytics for Virtual Worlds
Analytics for Virtual WorldsAnalytics for Virtual Worlds
Analytics for Virtual Worlds
 
Experiential Learning in Second Life
Experiential Learning in Second LifeExperiential Learning in Second Life
Experiential Learning in Second Life
 
Virtual Worlds in Education
Virtual Worlds in EducationVirtual Worlds in Education
Virtual Worlds in Education
 

Recently uploaded

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Short URLs, Big Fun

Editor's Notes

  1. This is going to be a talk for people who love the internet.
  2. The true story of bitly, engineering, data science, loveHow to do data science at scaleBuilding teams and keeping people happyClever tricks
  3. Thomas Karaganis at MS Research 1% of new URLs per day
  4. Shortened links get shared on different platforms and methods
  5. Messages move across platforms in complicated ways
  6. We have a lot of growing up to do
  7. http://www.flickr.com/photos/wanderingnome/73328967/sizes/l/in/photostream/Philosphy, science, engineering, cool tricks
  8. Pow! Surprise! Here we are!
  9. …first, we understand it
  10. …first, we understand it
  11. Asking questions.
  12. http://www.flickr.com/photos/32443746@N07/4753829490/
  13. Egypt.
  14. Tunisia.
  15. …and we can do the same thing for geo data
  16. Studied offline, using hadoopBuild a supervised classifier over timeseriesBuild a random forest ensemble decision tree classifier
  17. The data system fortune cookie gameCreative commons: http://www.flickr.com/photos/mzn37/308048794/sizes/o/in/photostream/
  18. The simplification is important for three reasons:1) A continuous function of time that simplifies to \\phi2) it’s linear, so the sum of the click rates on each page with a phrase is the click rate per phrase3) IT’S FAST
  19. The 0 at the origin insures that we have seen sustained click rates on a phrase before we think it’s anything useful.