Typical definition of Big Data (by IBM) is Volume, Velocity, Variety – but add a 4th attribute: Variability (thanks to Bryan Smith from MSDN)Volume Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.Velocity Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.VarietyBig data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.Variability Defined as the differing ways in which the data may be interpreted. Differing questions require differing interpretations.
There’s a big chasm when shifting from Excel to the next level.Small Data is fundamentally the same as Visicalc… invented nearly 35 years ago!
BI is about back office – the roots we don’t see but supports the whole business.Web analytics used to be only about the front end, what is visible, how people interact with the business.But there was an obvious growing necessity to also connect to the back office.
In the pre-big data era, statistical science was necessary to make up for the inherent limitations of incomplete data samples. Statisticians and scientists were forced to cleanse, hypothesize, sample, model, and analyze data to arrive at contingent.Big Data becomes akin more to a problem in algorithm and architecture design than one of learning and quantifying uncertain knowledge using statistical science.(Too Big to Ignore, Deloitereview.com)Most important skills:Understanding (and helping to articulate) an organization’s question, problems or strategic challenge and then translating them into the design of one or more data analysis projectBetter to have an approximate answer to the right question than a precise answer to the wrong question (John Tukey)Creation of innovative “data features”
As rich and detailed as practical given the business context
VO: Case specific, Heavy math. Tough Stuff. Elegantly complex. Beautifully simple. What does it mean? Huge opportunity.There are many different calculations and approaches. Basically it is about understanding a customer's potential value (can you change from students to customers, we can make it more anonymous), and likilihood that they will meet that potential (churn).
Customer potential value starts off with a fairly even distribution that skews to a base of lower value customersAs churn begins, many potentially higher value customer drop out causing a very skewed value distributionUsing all back office data to do this. We create 5 quintiles of total & potential revenue and see how many customers account for each quintile. Purpose here is to understand how there are some customers that are just so much more value than others. Many factors cause this to occur and the departure from potential to actual is churn at play.Total revenue is literally adding up all revenue and dividing by 5. 100 million in revenue makes 20 million buckets. We do this for actual and potential revenue. Potential is estimated based on customer behavior. Actual comes from the cash register.The next 2 columns are the % of students that align to each 20 million bucket. You will see from a potential perspective there are more customers that could drive a higher value but churn occurs and that is why the actual bar is more skewed.Another explanation is that there are some customers that have a potential of spending 100 dollars but due to churn they only spend 40. That is why the bar charts change from potential to actual.Also, Potential is twice as big as actual in this case. 50% of revenue is lost.
There is no automatic, purely algorithmic way to extract the right islands of information from oceans of raw dataIt requires a combination of domain knowledge, creativity, critical thinking, an understanding of statistical reasoning, and the ability to visualize and program with data(Putting the science in data science – deloitereview.com)
“A wealth of information creates a poverty of attention” (Herbert Simon, quoted in deloitereview.com)“Analytics initiatives ultimately do not begin with data: with clearly articulated problems to be addressed and opportunities to be pursued. More data does not guarantee better decisions”
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec@SHamelCP Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Big Data Business Intelligence Web Analytics
Customer Value Modeling What are customers worth?LTV » AR = SR ´ p + PR ´ (1- p)@SHamelCP
Value tiers! All add value: some are better investments than others 60% of revenue 10% 7% 20% 10% 29% of customers 16% 12% Value Tier Quintiles 20% 19% 16% 20% 22% Info retained 71% of customers 20% 55% 40% of 33% revenue 20% Total Revenue Student Allocation of Student Allocation of Total Revenue Customer Allocation ofCustomer Allocation of Potential Value Potential Value Actual Value Actual Value@SHamelCP
What to do… Who are they? 29% How can you attract more of them? Who are they? 71% How much are you spending to acquire them?@SHamelCP
Cheating churn Certain factors drives churn… Multivariate model used to measure factors influencing customer profile Female Male Age (+) Demo Doctorate MasterBachelor Associate Education Diploma Certificate A Info retained B H E Factors L X Zip Income (+) Location Isolate the Value Targeting factors that can be used to attract a higher value segments! @SHamelCP
Channel Marketing Efficiency Grid Channel Conversion Use Value Targeting and shift spend from inefficient Channels and go after a higher value prospect Bubble size represents number of customers,, alumni, donor added by channel. Info retainedChannel Life Time Value Feeder Channel @SHamelCP
WHAT ABOUT YOUR FUTURE? Business Strategy Goals Provides: Communicate: Actionable insight & Business requirement & recommendations objectives Analytics Center of Excellence Analysis Enabling Capabilities Technological capabilities & Statistical analysis Supply: constraints Problem solving Means, tools and data Web development Synthesis of data Information architectureCommunication through User Experience reports & dashboards Instrumentation & BI @SHamelCP
NEXT STEPS Analytics = Context + Data + Creativity Small Data is readily available Cautious optimism Define your future!@SHamelCP
Stéphane HamelDirector of Strategic Services Quebec City, QC, Canada Stephane Hamelshamel@cardinalpath.com @SHamelCPTel: 418-454-2637 www.CardinalPath.com
Additional infoChime in at http://online-behavior.com/analytics/big-data
Gartner Hype CycleTechnology Peak of Inflated Trough of Slope of Plateau of Trigger Expectations Disillusionment Enlightenment Productivity @SHamelCP A visualization of all the Hype Cycle data January 26th, 2013 by Mark Raskino http://blogs.gartner.com/hypecyclebook/2013/01/26/a-visualization-of-all-the-hype-cycle-data/
Attributes of “Big Data”Big data spans three dimensions: Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information. Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business. Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.Bryan Smith of MSDN adds a forth “V”: Variability – Defined as the differing ways in which the data may be interpreted. Differing questions require differing interpretations.
Perspective for Digital Analysts• Acquisition of data• Serialization and sanitization of data• Storage Areas of• Servers (cloud or traditional) immediate• NoSQL (Hadoop) interest for• MapReduce digital• Processing analysts• Visualization• Predictive• Natural Language Processing (NLP)• Machine Learning