Description:
One of the biggest challenges for people building data products today is developing and refining features for modeling purposes (i.e. feature extraction) with the volume and variability of web scale data. In this talk, Martin will discuss some of the challenges and solutions faced by Kontagent as it built out a predictive lifetime value model for its customers. As you will learn, Hadoop is critical to this feature extraction process, and Cascading is quite handy when building out more complex features than can be readily developed in a query framework like Hive.
Speaker:
Martin Colaco, Director of Data Science for Kontagent
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)Justo Hidalgo
Keynote presentation at LeanCamp Madrid 2012. Metrics are key for every startup. This presentation shows some basics about metrics and analytics, with specific examples about how they're being used in 24symbols, a publishing-related startup.
New IDC Research on Software Analysis & MeasurementCAST
Watch this exciting webinar with Melinda Ballou, a leading analyst with IDC, as she reviews the newly defined market category of Software Quality Analysis and Measurement (SQAM). Hear Melinda discuss the motivation behind increased spend on SQAM such as competitive pressures requiring rapid adaptability while avoiding software failure, complex sourcing environments that include onshore, offshore and open source options, and economic impacts that drive efficiency and accountability in development.
To view the webinar, visit http://www.castsoftware.com/news-events/event/idc-software-analysis-measurement?gad=ss
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)Justo Hidalgo
Keynote presentation at LeanCamp Madrid 2012. Metrics are key for every startup. This presentation shows some basics about metrics and analytics, with specific examples about how they're being used in 24symbols, a publishing-related startup.
New IDC Research on Software Analysis & MeasurementCAST
Watch this exciting webinar with Melinda Ballou, a leading analyst with IDC, as she reviews the newly defined market category of Software Quality Analysis and Measurement (SQAM). Hear Melinda discuss the motivation behind increased spend on SQAM such as competitive pressures requiring rapid adaptability while avoiding software failure, complex sourcing environments that include onshore, offshore and open source options, and economic impacts that drive efficiency and accountability in development.
To view the webinar, visit http://www.castsoftware.com/news-events/event/idc-software-analysis-measurement?gad=ss
For those new to the Salesforce Platform, we’ll get you up and building cloud apps quickly by introducing you to the basics of the platform with step-by-step hands-on tutorials. You’ll be able to create an app with point-and-click development and then see how you go a little further with Apex Code and Visualforce.
What are the top 3 traps in product development from the founder's perspective? How to design a product release map based on ROI? Waterfall vs lean and agile. How to compromise on quality and speed? What make a great product? How to define a good UI / UX? The value of simplicity in product development.
The adoption of Cloud Computing and SaaS based applications is reaching a tipping point as more and more companies are realizing the benefits of moving operations into The Cloud. It is time for finance and accounting professionals to identify and assess the value proposition of moving finance and accounting into The Cloud for their companies.
This presentation content includes current and emerging trends in in Cloud Computing and SaaS for finance, accounting and related functions: what they are, where they can be used, how companies are driving results by moving systems into the cloud. These insights come from seasoned practitioners and experts in Cloud Computing and SaaS based applications
My talk from #IDEAS18 on my approach at Sense360 to rapidly build and productionize in SQL a regularized logistic regression model, which works daily on scoring millions of user location visits (obtained from mobile sensor data) to classify visits to dense locations such as shopping malls.
Gaming the Social: Community, Measurement & MonetizationSuperData
This lecture will show how the dynamics of online audiences can be harnessed and discuss essential components of a sustainable online entertainment business. Perhaps more so than traditional media and entertainment firms, a direct interaction with their customer base allows online game companies to cultivate loyalty and long-term profitability. To do this effectively, however, requires an ongoing effort to understand audience preferences and behaviors that go beyond the usual surface-level market research. Here, I will present several key insights into online communities and how these may fit into a larger strategic approach.
🔮 Want more VC/investment startup pitch decks? We’ve centralised ALL succesful investor pitch decks at: https://chagency.co.uk/getstartupfunding — check all of them out
🔮 The effort is adhering to the ideology of “The Future Of Freemium” — read more here: https://chagency.co.uk/blog/ceo/the-future-of-freemium-how-to-get-peoples-attention/
🔮 Our library of pitch decks will not have any advertisement, only a signature. We are a design agency that helps SaaS CEOs reduce user churn.
Avnet, Inc. 2010 Analyst Day & 50th Anniversary Celebration: Dec 15, 2010
Presenters included: Roy Vallee, chairman and chief executive officer; Rick Hamada, president and chief operating officer; Ray Sadowski, senior vice president and chief financial officer; Harley Feldberg, president, Electronics Marketing; and Phil Gallagher, president, Technology Solutions.
Following the analyst day event, Avnet commemorated its 50th anniversary on the New York Stock Exchange by ringing the closing bell.
How to build irresistible social casino gamesKontagent
So game companies, if social gambling games are the new battlefield, how do you plan on win the war? Arming yourself with the right tools can make the difference between victory and defeat.
KK2013 - A Vision of the Future - Jeff TsengKontagent
At Kontagent Konnect 2013, Kontagent CEO and co-founder Jeff Tseng presented on how mobile is driving the “perfect customer experience”–which is “omnipresent, relevant and intimate.”
More Related Content
Similar to Feature Extraction for Predictive LTV Modeling using Hadoop, Hive, and Cascading - Kontagent
For those new to the Salesforce Platform, we’ll get you up and building cloud apps quickly by introducing you to the basics of the platform with step-by-step hands-on tutorials. You’ll be able to create an app with point-and-click development and then see how you go a little further with Apex Code and Visualforce.
What are the top 3 traps in product development from the founder's perspective? How to design a product release map based on ROI? Waterfall vs lean and agile. How to compromise on quality and speed? What make a great product? How to define a good UI / UX? The value of simplicity in product development.
The adoption of Cloud Computing and SaaS based applications is reaching a tipping point as more and more companies are realizing the benefits of moving operations into The Cloud. It is time for finance and accounting professionals to identify and assess the value proposition of moving finance and accounting into The Cloud for their companies.
This presentation content includes current and emerging trends in in Cloud Computing and SaaS for finance, accounting and related functions: what they are, where they can be used, how companies are driving results by moving systems into the cloud. These insights come from seasoned practitioners and experts in Cloud Computing and SaaS based applications
My talk from #IDEAS18 on my approach at Sense360 to rapidly build and productionize in SQL a regularized logistic regression model, which works daily on scoring millions of user location visits (obtained from mobile sensor data) to classify visits to dense locations such as shopping malls.
Gaming the Social: Community, Measurement & MonetizationSuperData
This lecture will show how the dynamics of online audiences can be harnessed and discuss essential components of a sustainable online entertainment business. Perhaps more so than traditional media and entertainment firms, a direct interaction with their customer base allows online game companies to cultivate loyalty and long-term profitability. To do this effectively, however, requires an ongoing effort to understand audience preferences and behaviors that go beyond the usual surface-level market research. Here, I will present several key insights into online communities and how these may fit into a larger strategic approach.
🔮 Want more VC/investment startup pitch decks? We’ve centralised ALL succesful investor pitch decks at: https://chagency.co.uk/getstartupfunding — check all of them out
🔮 The effort is adhering to the ideology of “The Future Of Freemium” — read more here: https://chagency.co.uk/blog/ceo/the-future-of-freemium-how-to-get-peoples-attention/
🔮 Our library of pitch decks will not have any advertisement, only a signature. We are a design agency that helps SaaS CEOs reduce user churn.
Avnet, Inc. 2010 Analyst Day & 50th Anniversary Celebration: Dec 15, 2010
Presenters included: Roy Vallee, chairman and chief executive officer; Rick Hamada, president and chief operating officer; Ray Sadowski, senior vice president and chief financial officer; Harley Feldberg, president, Electronics Marketing; and Phil Gallagher, president, Technology Solutions.
Following the analyst day event, Avnet commemorated its 50th anniversary on the New York Stock Exchange by ringing the closing bell.
How to build irresistible social casino gamesKontagent
So game companies, if social gambling games are the new battlefield, how do you plan on win the war? Arming yourself with the right tools can make the difference between victory and defeat.
KK2013 - A Vision of the Future - Jeff TsengKontagent
At Kontagent Konnect 2013, Kontagent CEO and co-founder Jeff Tseng presented on how mobile is driving the “perfect customer experience”–which is “omnipresent, relevant and intimate.”
Webinar: Econsultancy 2013 Mobile Sophistication and Strategy ReportKontagent
View our webinar with Econsultancy Vice President of Research Stefan Tornquist. He’ll discuss the results of the 2013 Mobile Sophistication and Strategy Study, which examine how global corporations are looking at mobile consumer behaviors, where they’re spending, and why they may be missing the mark. Use these insights to:
- Strengthen your long-term mobile strategy;
- Effectively measure cross-channel behaviors;
- Design more fluid user experiences across screens and devices.
- Don’t be like the two-thirds of business leaders who say they’re not prepared for mobile.
Get Your Mobile App Discovered and Amp Up User AcquisitionKontagent
View our on-demand webinar on boosting the discovery and user acquisition for your mobile app.
We've partnered with Tapjoy to provide you a complete guide to building a great app that users will find and install. Our expert panel will dig deep into the challenges and opportunities mobile developers and marketers face, including:
- Reaching larger mobile audiences
- Determining the mobile ad campaigns that are producing the most high-quality users
- Application and app design tips to boost user acquisition
And much more!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Agenda
• What is predictive modeling
• What is Lifetime Value (LTV)
• What is feature extraction - challenges
• How can we build a cohort-based predictive LTV
model
o Python
o Hive with Hadoop
o Cascalog with Hadoop
3. Can we predict how many attendees tonight?
• How to estimate? Door count (after the fact)
• Is there a way to build a model that we can
use to predict attendees?
4. Predicting how many attendees tonight?
Attendees = Registrations x % Attendance + Non-registrants
5. Predicting how many attendees tonight?
Attendees = Registrations x % Attendance + Non-registrants
Attendees = 201 x 50% + 25 = 125
Lots of Uncertainty
Location Date & Time Company
Speaker
Title & Topic
6. Predictive Modeling
• Know the question you want to answer
• Look at historical behavior
• Apply understanding of those behaviors to new
situations -> new groups of users
Fame
Feature Model Model
Data Success
Extraction Selection Validation
Riches
7. Common use cases for predictive modeling
My chemical engineering roots….
In – Out = Accumulation
IN D Out
8. Users: Maximizing Growth
In – Out = Accumulation
IN D = Growth Out
App or Network of Apps
Paid marketing Frustration?
Organic Boredom?
X-promotion Too expensive?
Bad UX?
No new content?
9. Money: Maximizing Profit
In – Out = Accumulation
IN D = Profit Out
App or App Network or Business
Lifetime Value Business expenses:
(LTV) Marketing costs
Operations (servers, etc.)
Employee costs
10. How Do We Estimate LTV
Business Model LTV
Download Cost per Download
Avg. Price x Avg.
Subscription
Customer Lifetime
Microtransactions ???
(Ads / In-app-purchases)
11. LTV Modeling – Social / Mobile Games
LTV = (1 + k) * Retention * ARPU
Output
Features
Variable
Daily Retention Curve ARPDAU Curve
100.00% $0.10
% of users retained
80.00% $0.08
ARPDAU
60.00% $0.06
40.00% $0.04
20.00% $0.02
0.00% $-
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Days since install Days since install
12. Predictive LTV Result
300
250
200
Cumulative Spend
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
Days Since Install
13. Challenges with this simple LTV model
• All of these parameters are moving targets
• k-factor is wildly variable (we’ll ignore k-factor in this
presentation)
• Acquisition costs can change (as can LTV and
retention) - Cohort LTV by install date and install
source
ARPDAU Curve Retention Curve
$0.10 % of users retained 100.00%
$0.08 80.00%
ARPDAU
$0.06 60.00%
$0.04 40.00%
$0.02 20.00%
$- 0.00%
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Days since install Days since install
14. Challenges with this simple LTV model
• All of these parameters are moving targets
• k-factor is wildly variable (we’ll ignore k-factor in this
presentation)
• Acquisition costs can change (as can LTV and
retention) - Cohort LTV by install date and install
source
• Retention is computationally difficult to calculate
• Large games can have millions of users who spend
money over many months/years
How can we build out the features we need
to model LTV by cohort?
15. Kontagent Facts
• Founded in 2007
• 130+ employees and growing
• 100s of Customers
• 1000s of Apps Instrumented
• 250+ billion events per month
• 200MM+ MAUs
• 1 Trillion Events in 2013
16. How does Kontagent collect data?
• Via a REST API
o APA – Install message
o EVT – Custom event message (user action)
o MTU – Spending message
• Yields a transaction log over time:
17. Feature Extraction for Predictive LTV
Need to translate a transaction log into a table
o Install Date o Users Active on Date
o Install Source o Users Active on Date or After
o Activity Date
o Spend on Date
o Cumulative Spend to Date
18. How can we compute this table of features?
• Python – single thread
o Might work in some cases but need to cache
potentially millions of rows of data
• Hive with Hadoop
o Data warehouse system that allows SQL-like
querying capabilities of distributed data structures
o Let’s work through this….
19. Hive query
•
Transaction log
Store data in Hadoop
APA EVT MTU
• Query using Hive select distinct s
from demo_apa
Query Language where kt_date(utc_timestamp) = '2011-07-08' and s is
not null and month=201107
(HiveQL)
20. This query gets cumbersome quickly…
select sub1.gameplay_date as play_date, sub1.returned,
sub2.spenders, sub2.total_daily_spend
from
(select gp.gameplay_date, count(distinct gp.s) as returned
from
(
select distinct s
from demo_apa
where kt_date(utc_timestamp) = '2011-07-08' and s is not null
and month=201107
) base
left outer join
(
select s, kt_date(utc_timestamp) as gameplay_date
from demo_evt
where s is not null and month>=201107
) gp on gp.s = base.s play_date returned spenders total_daily_spend
group by gp.gameplay_date 7/10/2011 2 1 75
) sub1 7/11/2011 4 2 19
join
(select sp.spend_date, count(distinct sp.s) as spenders,
7/12/2011 1 1 0.2
sum(sp.spend)/100 as total_daily_spend
from
(
select distinct s
from demo_apa
where kt_date(utc_timestamp) = '2011-07-08' and s is not null
and month=201107
) base
left outer join
(
select s, kt_date(utc_timestamp) as spend_date, v as spend
from demo_mtu
where s is not null and v>0 and month>=201107
) sp on sp.s = base.s
group by sp.spend_date
) sub2 on sub1.gameplay_date=sub2.spend_date
21. Feature Extraction with HiveQL
o Install Date o Spend on Date
o Install Source o Users Active on Date or After
o Activity Date o Cumulative Spend to Date
o Users Active on Date
Problem - HiveQL doesn’t support non equi-joins
Options for improving Hive performance
• Write tables or temp tables
• Code up some UDFs
22. How can we compute this table of features?
• Python – single thread
• Hive with Hadoop
• Cascalog (Cascading) with Hadoop
o Cascading is a flow based computational model for
Hadoop
o Cascalog is a declarative based system for
cascading
o Let’s work through this…
24. Feature Extraction with Cascalog
o Install Date o Spend on Date
o Install Source o Users Active on Date or After
o Activity Date o Cumulative Spend to Date
o Users Active on Date
Options for improvement
• Code not optimized – CPU limited
25. What have we learned
• Martin sucks (or is awesome) at predicting number of
attendees at Meetups!
• Predictive modeling (particularly around LTV) can have a
huge impact on a business
o Requires intuition and iteration
o In the big data world, feature extraction can be quite a huge
challenge
• Feature extraction can be done with Hadoop
o HiveQL is nice because analysts can use it, but it can be
inefficient and not generate all the features we need
o Cascading can solve most of these problems and generate the
clean features we need
26. Questions?
Need a job? We’re hiring:
http://www.kontagent.com/company/careers/
Martin Colaco
Head of Data Science
martin.colaco@kontagent.com