Play with Kaggle

Newport Interactive Marketers

Cloudera Movies Data Science Project On Big Data

Abhishek M Shivalingaiah

The document describes a data science project conducted on streaming log data from Cloudera Movies, an online streaming video service. The goals of the project were to understand which user accounts are used most by younger viewers, segment user sessions to improve site usability, and build a recommendation engine. Key steps included exploring and cleaning the data, classifying users as children or adults using a SimRank approach, clustering user sessions to identify behavior patterns, and predicting user ratings through user-user and item-item similarity models to build a recommendation system. Accuracy of 99.64% was achieved in classifying users.

Google Analytics

This document provides an overview of Google Analytics. It discusses why websites should use analytics and how to install Google Analytics. Key definitions are explained, like unique visitors, visits, bounce rate, etc. Case studies are presented showing how analytics was used to measure SEO performance and analyze complex user interactions on websites. Resources for learning more about analytics and SEO are also listed. The document concludes by explaining how Google Analytics works and the cookies it uses to track visitors.

Iterative Methodology for Personalization Models Optimization

moma-django overview --> Django + MongoDB: building a custom ORM layer

Gadi Oren

moma-django is a MongoDB manager for Django. It provides native Django ORM support for MongoDB documents, including the query API and the admin interface. It was developed as a part of two commercial products and released as an open source. In the talk we will review the motivation behind its developments, its features and go through 2-3 examples of how to use some of the features: migrating an existing model, advanced queries and the admin interface. If time permits we will discuss unit testing and south migrations. Please find the video at: http://www.youtube.com/watch?v=cxQKTDLjb-w Also check out: https://twitter.com/gadioren and www.ITculate.io

Supercharging your Organic CTR

This document discusses supercharging organic click-through rate (CTR) through the use of JSON for Linked Data (JSON-LD). It covers: 1. What JSON-LD is and the benefits it provides like rich snippets and action buttons 2. Different implementation methods like using WordPress plugins or Google Tag Manager 3. Examples of JSON-LD markup for things like products, reviews, and local businesses 4. Testing and monitoring the impact on organic CTR before and after implementing JSON-LD

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data

Mail.ru Group

kdd2015

This document discusses scaling machine learning and statistics for web applications like recommendations, search, and advertising. It describes LinkedIn's vision of connecting the world's professionals to make them more productive and successful. It discusses using algorithmic match-making via machine learning and data mining to connect talent with opportunities at scale. This involves continuously learning from historical user data and interactions through machine learning models and experiments to power services like search, personalized content recommendations, and advertising.

Search Engine Optimization

SD Sharma

This document discusses a proposed search engine optimization (SEO) system. It includes an abstract describing SEO and its goals. The scope section discusses how SEO is commonly used to improve search engine rankings. The proposed system would allow users to search for content by keyword and refine results. It would display search results across different formats. The system requirements, design, testing approach, and screenshots are also outlined. In conclusion, the document states that SEO is an ongoing process that requires constant adaptation to changes in technology and search engine algorithms.

How can a data layer help my seo

This document discusses how a JavaScript data layer can help with SEO by providing structured data to search engines. It covers: 1. Using HTML, microdata, and a JavaScript data layer to provide different types of structured data. A JavaScript data layer allows providing data not accessible to robots through HTML alone. 2. Benefits of a data layer for SEO include increased organic click-through rate, better SEO analysis, and enabling dynamic remarketing in AdWords. 3. Examples of setting up a data layer using Google Tag Manager, JSON-LD syntax, and pinging Googlebot to re-crawl pages to index the new structured data.

Mozilla Foundation Metrics - presentation to engineers

John Schneider

Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...

Lippo Group Digital

1) The document proposes a modified hierarchical agglomerative clustering algorithm to profile smartphone users' interests based on their browsing history. 2) It collects browsing history data from 30 student participants over 1 month and extracts URLs to analyze user interests. 3) The algorithm clusters URLs into categories and calculates the degree of interest for each category to create user profiles. 4) Experimental results show the proposed algorithm outperforms C4.5 in execution time and accuracy for profiling users' interests based on browsing history.

A new algorithm for inferring user search goals with feedback sessions

The document proposes a new algorithm to infer user search goals from query logs by analyzing feedback sessions constructed from user click-through data. It clusters feedback sessions to discover different user search goals for a query. It generates "pseudo-documents" to better represent feedback sessions for clustering. It also proposes a new evaluation metric "Classified Average Precision" to assess goal inference performance. The algorithm is tested on a commercial search engine's query logs and is shown to effectively infer user search goals.

Recommender Systems @ Scale - PyData 2019

Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk I'll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enable running complex models under difficult scale constrains and shorten the cycle between research and production. Sonya Liberman leads the Personalization team @ Outbrain's Recommendations group, developing large-scale machine learning algorithms for Outbrain's content recommendations platform serving tens of billions real-time recommendations a day. She specializes in Information Retrieval, Machine Learning, and Computational Linguistics. Before joining Outbrain, she led the Research and Algorithms @ ConvertMedia (acquired by Taboola). She holds an MSc in Computer Science and a BSc in Computer Science and Computational Biology. This invited talk was given at PyData Meetup, April 2019 https://www.meetup.com/PyData-Tel-Aviv/

A new algorithm for inferring user search goals with feedback sessions

Chester County Marketing Group

This document proposes a new algorithm to infer user search goals from query logs. It clusters feedback sessions from click-through data to discover different goals for an ambiguous query. It generates pseudo-documents to represent sessions for clustering. The algorithm evaluates inferred goals using a new metric called Classified Average Precision. Experiments on a commercial search engine validate the method effectively discovers user search goals.

Find and be Found: Information Retrieval at LinkedIn

Daniel Tunkelang

Shakti Sinha and Daniel Tunkelang discuss how LinkedIn's search functionality works. They explain that LinkedIn search is personalized based on a user's profile and network. Query understanding involves tagging queries to determine entity types like people, companies, or skills. Ranking is also personalized using machine learning models trained on search logs to determine relevance for a specific user's query. The system aims to provide both globally and personally relevant results, as about two-thirds of clicks come from out of a user's network.

Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...

Spark Summit

This document summarizes an algorithmic digital attribution model implemented using Spark. It begins with an overview of digital attribution and how algorithmic models can determine attribution weights. It then discusses how the model was implemented using Spark, including data processing, model building, and attribution calculations. Key lessons learned are around memory management, iterative computation, and error handling when working with Spark.

Intro to Google Analytics and Google AdWords (March 19 2013)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Similar to Play with Kaggle

Welcome Webinar Slides

Sumo Logic

SEMNE Google Analytics Master Class - 15 Oct 2014

Jay Murphy

Recsys2016 Tutorial by Xavier and Deepak

Newport Interactive Marketers

Cloudera Movies Data Science Project On Big Data

Abhishek M Shivalingaiah

Google Analytics

Iterative Methodology for Personalization Models Optimization

moma-django overview --> Django + MongoDB: building a custom ORM layer

Gadi Oren

Supercharging your Organic CTR

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data

Mail.ru Group

kdd2015

Search Engine Optimization

SD Sharma

How can a data layer help my seo

Mozilla Foundation Metrics - presentation to engineers

John Schneider

Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...

Lippo Group Digital

A new algorithm for inferring user search goals with feedback sessions

Recommender Systems @ Scale - PyData 2019

A new algorithm for inferring user search goals with feedback sessions