I am Srinu Adira, I manage Business Solutions at LinkedIn. I am primarily responsible for providing and enabling data solutions for iterative business decisions. Today, I am going to talk about big data eco system at LinkedIn. We manage 100s of Terabytes of data by leveraging scalable infrastructure.
LinkedIn’s mission is to connect the world’s professionals to make them more productive and successful. There are total 640M professionals in the world. As of last week, we have 200MM registered members from all over the world. LinkedIn operates the world’s largest professional network on the Internet in over 200 countries and territories.
LinkedIn believes in connecting talent with opportunity at massive scale. LinkedIn highly leverages scalable infrastructure to track user behavior. We believe in transforming the way the world works.
LinkedIn has grown member base has grown gradually during initial years. For the past 3years, growth has been phenomenal. Yup, we have reached 200M mark. Current rate is approximately 2 per sec with over 2M companies have their presence on LinkedIn. More than 80% of fortune 100 companies use LinkedIn to hire professionals. Approximately 5.7B searches were done in 2012 on LinkedIn. Bottomline is we are growing fast and expanding the professional network.
Why data is important for LinkedIn? Data is everywhere and LinkedIn believes in data.
Our founder, Reid Hoffman believes in constant iterations of product improvements. Our site reflects the same belief as we keep adding new features.
Of course, our SVP is a firm believer in data as to fix something we need to measure it. That’s where data plays a critical role. Extend it further, what gets measured gets improved. Whats gets analyzed gets monetized.
How does LinkedIn’s network impacts our members? LinkedIn analyzes large amounts of data on daily basis. This analysis results in relevant and valuable products, business solutions and services. In turn these improvements in products and services reflect in member growth and engagement. These improvements, in turn, generate more data. Cycle of improvements continues.
Who are the main drivers behind theses solutions?Our business analytics teams leverage these solutions to measure, analyze and predict the growth.Our sales analytics use the results of these analysis for improving the sales cyclesOur marketing teams use to fine-tune their campaigns such as emailsTalent Connect leverages it to match job and members vice-versaLast but not least, our business operations leverage these solutions to assess the pulse of our business
In order to enable business decisions, data insights provide much needed analysis. In this case, for eg, we can provide typical company level details such as total members, connections, employees viewed provide good overview of growth and engagement on LinkedIn for a given company. Another example, if you look at the company cloud which provide employee inflow and outflow. This kind of analysis equips our business with much needed data input to improve further.
What kind of solutions we generate? If you look into this cycle, as a step 1 - we focus on segmentation of our members with the help of standardization. This segmented data can be used for propensity modeling to fine-tune our target audience. In turn – this model helps in targeting our member base with richer usage experience. Besides targeting, we leverage data to come up with business forecasting to help our sales teams. Also, we analyze the churn and calculate lifetmevalue of a customer improve our customer base. For our solutions, we leverage tools and technology extensively. Few of them are MPP systems such as Teradata and Aster, distributed systems such as Hadoop for storage and processing. Besides MPP and hadoop, Java, machine learning technologies are used for various processes.
Who are the main drivers behind theses solutions?Our business analytics teams leverage these solutions to measure, analyze and predict the growh.Our sales analytics use the results of these analysis for improving the sales cyclesOur marketing teams use to fine-tune their campaigns such as emailsTalent Connect leverages it to match job and members vice-versaLast but not least, our business operations leverage these solutions to assess the pulse of our business
Now I will spend some time in going over our big data eco-system.
We broadly divide our data challenges into three dimensions. i.e. Volume, Variety and Velocity. This chart source from TDWI – courtacy - Phillip RussomAbout volume - we process terabytes of data in form of records, transactions, tables and files. About variety – we process various kinds of data such as structured, data base tables, unstructured, such some tracking data and semi structured.About velocity – our infrastructure is developed to incorporate various kinds of data streams. Such as Batch(files), Near time(tracking data), Real time (transactional data) besides streams
In order to accommodate accelerating volumes, increasing varieties and velocities, we are building our platforms and solutions that can scale, simplify and enable business decisions.
ERP data: transactional data, informationCRM: Marketing, campaigns, usage, engagementWeb: Engagement, pathing, Social Data, What happened? (BI and Reporting)AnalyzesReal time monitoring what the key business trends.Predictive analyzesTeradata too small
3 major dimensions of data empower analyticsBehavioral Data Site EngagementOL TransactionsSearchesNavigation pathsRFMCommentsDiscussions….Demographic DataLocationGenderTitleFunctionSeniorityEducation….Social DataConnectionsCo-viewsSentiment trackingNPSFollowsEndorsementsForwardsCommentsShares….
High level data flow architecture consists of user interaction with application generates various data sets such as near line lookup data, online data store to maintain user transactions such as profile information. Also, offline data is generated in form of web logs. In turn, all these data sets are centralized in offline data store. As you can see from this high level data flow, no single tool/technology can handle these needs. Hence, we have to build our own combination of tools and technologies to meet specific requirements.
What do we use for data stack?As you can see from this slide, we use mix of commercial tools such Teradata and Oracle and open source technologies such as hadoop, kafka and voldemort.Next slide we will look into where and how these tools are used.
LinkedIn leverages, builds tools and contributes to open source. Transactional data like member profile data is maintained in Oracle and Espresso.
For nearline, linkedin leverages Voldemort as distributed key value store where as D-Graph is used for distributed graph engine.
For pipelines – we leverage kafka and databus to transporting data from online and weblogs to offline data store.
For data analysis/reporting, we leverage hadoop and teradata systems. Teradata and hadoop are used for processing large data sets to enable machine learning and analytics.
Now I will spend some time in going over our big data eco-system.
3 major dimensions of data empower analyticsBehavioral Data Site EngagementOL TransactionsSearchesNavigation pathsRFMCommentsDiscussions….Demographic DataLocationGenderTitleFunctionSeniorityEducation….Social DataConnectionsCo-viewsSentiment trackingNPSFollowsEndorsementsForwards….
Many companies are stumbling blindly into social media marketing w/o a measurement strategy.Measurement is the kingThe first new model email campaign was launchedBy using half of the regular campaign volume -Within 2 hours, it triggered the NOC alerting system due to the order volume doubled on a w/w base.Within 9 hours, the new model already bypassed the old model which was launched 14 days on new sub acquisitions with a 7 day reminder email.Within the 7 days, we saw 300+% lift from the Gen model, and 480+% lift from the Sales model.The email open rate of new model over performed the old model.So far, we have not seen any increase on the opt-out rate. In fact, we have observed slightly decrease on the opt-out rate at high level.
2M companiesWe have thousands sales people,How to prioritize sales?We predict which account, how much revenue they will spend?We predict within the account, who can make the decision, it is a mix of behaviors and engagementWe predict within LinkedIn, who has the highest likelihood to close the deal? Who can be leveraged to close the deal?
4 principles at LinkedIn
4 phases of analytics.We’d like to predict future.
4 principles at LinkedIn
As our CEO says, always look for nextplay! Yes! Web 3.0 is all about data!!!
Thank You all! We are growing and need more professionals! We are hiring! Please reach out to me if you have questions!
A example of using data to improve salesWhich account? Who? How?Step 1 Step 2 Step 3Identity DataSocial DataBehavioral Data31
How to provide 500 to 1000X impact?Insights portal for sales org.Easy: quickly find right infoFast: few seconds response time formost insightsScalable: 2M+ accounts/prospectsAccurate: mimic analyst/data scientist123432