SlideShare a Scribd company logo
Data Science Highlights
Data Scientist
Square - San Francisco Bay Area
Job Description
Square is hiring a Data Scientist on our Risk team. The Risk team at Square is responsible for enabling growth while mitigating financial
loss associated with transactions. We work closely with our Product and Growth teams to craft a fantastic experience for our buyers and
sellers.
!
Desired Skills & Experience
As a Data Scientist on our Risk team, you will use machine learning and data mining techniques to assess and mitigate the risk of every
entity and event in our network. You will sift through a growing stream of payments, settlements, and customer activities to identify
suspicious behavior with high precision and recall. You will explore and understand our customer base deeply, become an expert in
Risk, and contribute to a world-class underwriting system that helps Square provide delightful service to both buyers and sellers.



To accomplish this, you are comfortable writing production code in Java and conducting exploratory data analysis in R and Python. You
can take statistical and engineering ideas from prototype to production. You excel in a small team setting and you apply expert
knowledge in engineering and statistics.



Responsibilities
1. Investigate, prototype and productionize features and machine learning models to identify good and bad behavior.
2. Design, build, and maintain robust production machine learning systems.
3. Create visualizations that enable rapid detection of suspicious activity in our user base.
4. Become a domain expert in Risk.
5. Participate in the engineering life-cycle.
6. Work closely with analysts and engineers.
!
Requirements
1. Ability to find a needle in the haystack. With data.
2. Extensive programming experience in Java and Python or R.
3. Knowledge of one or more of the following: classification techniques in machine learning, data mining, applied statistics, data
visualization.
4. Concise verbal and written articulation of complex ideas.
!
Even Better
1. Contagious passion for Square’s mission.
2. Data mining or machine learning competition experience.
!
Company Description
Square is a revolutionary service that enables anyone to accept credit cards anywhere. Square offers an easy to use, free credit card
reader that plugs into a phone or iPad. It's simple to sign up. There is no extra equipment, complicated contracts, monthly fees or
merchant account required.



Co-founded by Jim McKelvey and Jack Dorsey in 2009, the company is headquartered in San Francisco.
Sense Maker Segment
Sense makers need to create and/or employ insights to accomplish
their business goals and satisfy their responsibilities.
!
These insights emerge from independent and collaborative discovery
efforts that involve direct interaction with discovery applications, and
participation in discovery environments.
Insight Consumer
!
Analyst
Casual Analyst
Data Scientist
Analytics Manager
!
Problem Solver
Data Scientist: Profile
Data Scientist
Data Scientist / Senior Research Scientist
Data Scientists work with other members of the Data science team, using emerging methods and tools to engage with ‘Big
Data’ from a variety of external and internal sources. Data Scientists aim to generate actionable insights that transform the
organization; enhance existing products, services and operations; and identify, define and prototype new data-driven
products, services, and offerings.
They have advanced analytical skills and/or a specialized educational background, and rely on open-source and custom-
created tools, to address the ad-hoc and open-horizon questions the Data Science team takes on. Data Scientists collaborate
with Insight Consumers, evolving and publishing insights and prototypes of new offerings.
Business Goals & Work Setting
• Create new data-driven products, services, business opportunities
• Transform the business with insights derived from Big Data
• Create effective tools and infrastructure for the data science group
and other analytical groups within the organization
• Develop prototypes based on proprietary or open source tools
• Prototype new ways to visualize and understand data relationships
• May work within a business unit, providing analytical capability to
that unit only, or a centralized Data Science group
!
Discovery Needs
• Solves complex, critical problems & significant and unique issues.
• Have numerous and dynamic ill-formed questions with
unpredictable needs for data, visualization, discovery capabilities
!
Discovery Tools
• Open source tools and platforms for big data, ETL, visualization,
analysis, statistics: Hadoop, Cassandra, Kafka, Voldemorte,
• Open source algorithms languages: R, HIVE, PIG,
• Custom-developed analytical tools
Engagement w/ Discovery Applications
• Creates custom discovery applications to suit their own needs
• Application lifecycle involvement: rolls their own from
scratch, iterates and then publishes to wider audiences /
productizes
• Original author of all discovery solution elements: data / data
sets, information models, discovery applications and
workspaces
• Shares / publishes insights to decision-making groups &
social forums in the business
!
Collaboration
• Works with Engineers and Software Architects to create prototypes
and products
• Collaborates with Data Scientists on ill-formed questions
!
Skills & Expertise
• Data management, analytics modeling and business analysis
• Prototyping / software engineering
• Discovery: advanced statistics, quantitative and qualitative
analysis, machine learning, data mining, natural language
processing, computational linguistics, broad knowledge of applied
mathematics, statistical methods and algorithms
Profiles & Discovery Problem Spectrum
D
ata
Scientist
Analyst(all)
C
asualAnalyst
Problem
Solver
Ill-formed Well-formed
The ‘Conway Model’
http://upload.wikimedia.org/wikipedia/commons/4/44/DataScienceDisciplines.png
http://nirvacana.com/thoughts/wp-content/uploads/
2013/07/RoadToDataScientist1.png
What sort of animal?
They seem different than analysts:
• problem set
• relationship to discovery tools
• skills and professional profile
• discovery / analytical methods
• perspective
• workflow and collaboration
!
Are they? How?
Areas of Investigation
• Workflow
• Environment
• Organizational model
• Pain points
• Tools
• Data landscape
• Analytical practices
• Project structure
• Unmet needs
Interviews
Discussion Guide
Can you please walk me through a recent or current project?
a. How was the project initiated?
b. How defined was the business problem in the beginning? Did the problem change?
c. Where/who did you obtain data sets from? How did you make the decision?
d.Describe the data you used: How did the data sets look like? How big were they? Were they
structured or unstructured?
e. What tools or techniques did you use to do the analyses? Did they map to the specific steps you
mentioned just now?
f. How did you decide these were the tools/techniques to use? To what extent were these decisions
made by yourself and to what extent were they standardized by your group/team?
g. How did you present the results of your analyses? What tools did you use? What do you like and
dislike about your current tool set?
h. Which stage of this project was the most challenging? To what extent did the tools satisfy what you
intended to do? What features were lacking?
i. How much collaboration was there during each stage of the project?
i. Background and role of collaborators
ii. Collaboration modes
iii. Types of information shared
!
Thinking about the projects you have worked on, is there a common approach you take to address these
problems?
How did you decide on this approach/tools?
!
Transcripts & Recordings
Synthesis
Findings
Business
Analytics
(future)
Data
Science
(now)
=
Creates	
  data-­‐driven	
  insights,	
  offerings,	
  and	
  resources	
  to	
  transform	
  the	
  organiza7on
Work	
  Experience	
  	
  10	
  Years	
  
Educa0on	
  Ph.D.	
  Sta7s7cs,	
  MS	
  Bio-­‐Informa7cs
Job	
  Title	
  	
  Senior	
  Data	
  Scien7st	
  
Company	
  	
  LInkedIn
Summarize	
  &	
  Communicate	
  
!
Review	
  findings	
  with	
  colleagues;	
  
summarize	
  ,visualize,	
  and	
  
communicate	
  key	
  findings	
  to	
  
Insight	
  Consumers/decision	
  
makers
Prototype	
  &	
  Experiment	
  
with	
  data	
  driven	
  feature:	
  
!
How	
  can	
  we	
  	
  prototype/
evaluate	
  this	
  w/out	
  
disrup0ng	
  the	
  site?
Gather	
  Data	
  &	
  
Analyze	
  Results	
  
!
Use	
  descrip0ve,	
  
inferen0al,	
  and	
  
predic0ve	
  sta0s0cs	
  
to	
  evaluate	
  	
  results
Analyze	
  &	
  Iden7fy	
  causal/
predic7ve	
  factors:	
  
Who	
  are	
  the	
  best	
  
candidates	
  to	
  contact	
  for	
  a	
  
job	
  based	
  on	
  recruiter	
  
needs	
  and	
  profile	
  content?
Dana	
  Data	
  Scien0st	
  
• Defining	
  and	
  capturing	
  useful	
  measures	
  of	
  
online	
  aMen0on	
  
• GeOng	
  all	
  the	
  data	
  analy0c	
  tools	
  to	
  work	
  
together	
  properly	
  	
  
• No	
  current	
  workflow	
  support	
  or	
  tools	
  for	
  data	
  
wrangling,	
  analysis,	
  	
  experimenta0on,,	
  and	
  
prototyping
• Effec0ve	
  tools	
  to	
  help	
  experiment	
  with	
  and	
  
evaluate	
  value	
  /u0lity	
  of	
  features	
  and	
  
ac0vi0es	
  for	
  users	
  
• Ability	
  to	
  rapidly	
  prototype	
  data-­‐driven	
  
features	
  w/out	
  risk	
  of	
  online	
  service	
  
disrup0ons
• Open	
  source	
  data	
  manipula0on,	
  mining	
  &	
  
analysis	
  tools	
  including	
  R,	
  Pig,	
  Hadoop,	
  Python,	
  
etc.	
  	
  
• Sta0s0cal	
  packages	
  such	
  as	
  SAS,	
  SPSS,	
  etc.	
  
• Custom	
  analy0cal	
  tools	
  built	
  using	
  open	
  source	
  
components	
  and	
  languages
• Leverage	
  data	
  to	
  support	
  the	
  org	
  mission	
  
• Enhance	
  products	
  &	
  services	
  with	
  data-­‐driven	
  
insights	
  and	
  features	
  
• Use	
  data	
  to	
  iden0fy	
  new	
  	
  opportuni0es	
  and	
  
prototype/drive	
  new	
  customer	
  offerings	
  
• Create	
  useful	
  data	
  sets/streams,	
  measures,	
  &	
  
resources	
  (e.g.,	
  data	
  models,	
  algorithms,	
  etc.
Key	
  Goals
Tools
Pain	
  Points
Wish	
  List
Sample	
  Workflow
Dana	
  is	
  a	
  Senior	
  Data	
  Scien0st	
  who	
  has	
  worked	
  at	
  LinkedIn	
  for	
  5	
  years.	
  	
  Dana’s	
  
educa0on	
  includes	
  a	
  Ph.D.	
  in	
  Sta0s0cs	
  and	
  an	
  MS	
  in	
  Bio	
  Informa0cs.	
  	
  Dana’s	
  
previous	
  work	
  includes	
  posi0ons	
  in	
  academic	
  research	
  groups	
  as	
  a	
  doctoral	
  
candidate	
  and	
  post-­‐doc,	
  as	
  well	
  as	
  so_ware	
  engineering	
  roles	
  in	
  the	
  Internet	
  &	
  
technology	
  industries.
• Dana	
  works	
  with	
  several	
  other	
  data	
  scien0sts	
  and	
  her	
  Analy0cs	
  Manager	
  on	
  
a	
  centralized	
  team	
  
• Dana	
  and	
  her	
  colleagues	
  aim	
  to	
  create	
  data	
  driven	
  insights,	
  features,	
  
resources,	
  and	
  offerings	
  that	
  deliver	
  strategic	
  value	
  to	
  LinkedIn	
  
• Dana	
  works	
  with	
  Analysts	
  on	
  other	
  teams	
  to	
  define	
  and	
  create	
  discovery	
  
tools,	
  data	
  sets,	
  and	
  methods	
  for	
  use	
  by	
  their	
  groups	
  at	
  LinkedIn.	
  
• Dana	
  &	
  team	
  are	
  visible	
  &	
  well	
  established	
  within	
  LinkedIn,	
  and	
  have	
  a	
  voice	
  
in	
  product	
  strategy	
  and	
  opera0onal	
  context;	
  they	
  have	
  a	
  high	
  degree	
  of	
  
autonomy	
  in	
  defining	
  data	
  science	
  projects	
  
• Dana	
  works	
  with	
  Insight	
  Consumers	
  to	
  suggest	
  and	
  determine	
  poten0al	
  new	
  
data	
  driven	
  offerings	
  to	
  prototype	
  and	
  evaluate.
• How	
  can	
  we	
  leverage	
  data	
  to	
  increase	
  online	
  engagement	
  with	
  LinkedIn?	
  	
  
• How	
  should	
  we	
  measure	
  engagement	
  &	
  what	
  factors	
  drive	
  it?	
  
• What	
  aspects	
  of	
  a	
  personal	
  profile	
  are	
  most	
  likely	
  to	
  encourage	
  /	
  
discourage	
  new	
  connec0ons	
  between	
  people?	
  
• How	
  can	
  we	
  increase	
  people’s	
  ac0vity	
  and	
  contribu0ons	
  to	
  topical	
  	
  
discussion	
  groups?	
  
• What	
  factors	
  drive	
  the	
  effec0veness	
  of	
  our	
  marke0ng	
  campaigns?	
  	
  
• Why	
  did	
  one	
  of	
  our	
  marke0ng	
  campaigns	
  work	
  excep0onally	
  well?	
  
• How	
  can	
  leverage	
  data	
  to	
  help	
  recruiters	
  iden0fy	
  and	
  communicate	
  
effec0vely	
  	
  with	
  qualified	
  and	
  poten0ally	
  available	
  candidates?
Typical	
  Discovery	
  Scenarios	
  &	
  Problems
Background
Work	
  Context
• Mines,	
  analyzes,	
  &	
  experiments	
  with	
  data	
  to	
  
iden0fy	
  paMerns,	
  trends,	
  outliers,	
  causal	
  
factors,	
  predic0ve	
  models,	
  &	
  opportuni0es	
  
• Defines	
  and	
  explains	
  newly	
  devised	
  
measurements,	
  predic0ve	
  models,	
  &	
  
insights	
  
• Compares	
  effec0veness	
  of	
  opera0ons	
  at	
  
achieving	
  company	
  goals	
  for	
  engagement,	
  
growth,	
  data	
  quality	
  
• Produces	
  &	
  explores	
  new	
  data	
  sets	
  
• Collaborates	
  with	
  other	
  data	
  scien0sts	
  to	
  
capture	
  new	
  data	
  streams	
  
• Prototypes	
  new	
  data	
  driven	
  site	
  features/
offerings	
  
• Runs	
  data	
  based	
  experiments	
  to	
  test/
evaluate	
  models,	
  hypotheses	
  &	
  prototypes	
  
• Communicates	
  &	
  explains	
  analyses	
  to	
  
colleagues	
  &	
  Insight	
  Consumers
I’ll	
  do	
  whatever	
  it	
  takes	
  –	
  wrangle,	
  
extract,	
  manipulate,	
  analyze,	
  
experiment,	
  prototype	
  –	
  to	
  use	
  data	
  
to	
  drive	
  value	
  &	
  innovate
“	
  	
  
”
Ac7vi7es
Empirical
AugmentedAugmented
AcceleratedAccelerated
Cooperative
Business Analytics Data Science
Intuitive
Manual
Gradual
Individual
Empirical
Augmented
Accelerated
Cooperative*
Nature of sense making activity
The Essence
• Empirical perspective
• Business imperatives drive activities
• Analytical approach
• Recipe is always the same
• Engineering always present
• Data challenges are paramount
• consume 60% - 80% of time and effort
• Data volumes range huge to moderate (PB > MB)
• Domain often drives analysis
• Data scientists already have self-service
• Some new problems, many the same
• Use ‘advanced’ analytics, not conventional BA
• Innovate by applying known analyses to new data
• Current workflow fragmented across tools and data stores
• Success can be a model, product, insight, infrastructure, tool
State of the Discipline
A small set of formally constituted Data Science teams at major Internet and
technology companies (Facebook, Google, MicroSoft, Yahoo, Twitter,
LinkedIn, eBay, Amazon) lead the field in most identifiable respects:
• maturity of practice - sophistication of methods, quality of infrastructure
• history and tenure as formal function / group
• business integration and impact
• internal and public visibility
• pace of innovation in methods, tools, architecture
• quality and rate of contributions to open source and other tools /
infrastructure
• role in the industry and public discourse on data science: visibility in
community, publication of experiments and findings, etc.
Tooling & Infrastructure
Leading shops have their own comprehensive and often home-built / heavily
customized data science environments, tools, infrastructure.
!
This infrastructure is aligned to the particulars of their domain and business.
Their data science environments are sometimes considerably more 'mature'
than those of other shops.
!
The large majority of existing data science teams and practices are 'followers'
of these leaders, in the sense that while they have idiosyncratic problems and
varying domains to address, they rely on innovation from the DS leaders to
guide the evolution of their data science practices.
!
Their environments reflect a mix of some purpose-built data science
components, and infrastructure extended / adapted from business analytic
needs such as BI.
Tooling & Infrastructure
Many organizations are establishing new data science capabilities. A minority
of these create new data science teams / practices from scratch without
building out other conventional analytical capabilities such as BI. They will
need new environments to support data science activities, and may leapfrog
older generations of analytic environment, following leaders by directly
creating new 'stacks' oriented more specifically for data science.
!
The majority of organizations are creating new data science capabilities by
building on existing analytical groups and functions. In terms of environments
and infrastructure, these organizations have existing analytical environments
aligned to BI and other business analytic functions, not specifically adapted to
data science needs. Cumulative investment in these environments can be
very high.
!
New teams will need new tools. Existing teams will need new tools to support
new discovery activities
!
Berkeley Data Analytics Stack is the most visible open source 'platform' at the
moment. No interview participants mentioned it.
Organizational Model
Data science capability = provisioned via standard org models (ranging across
in house, external, centralized, embedded, etc.).
!
The ways data science teams and practice groups are managed and their
relationship to the orgs they are part of seems to be conventional / familiar.
!
We can summarize the landscape of organizational models for providing data
science capability by plotting the size of data science team / pool of resources
vs. the 'distance' from the problem / need.
!
Landscape reflects common patterns for specialized expertise.
!
This could shift over time as discovery maturity increases overall first within
the analytics industry, then within the general business realm.
Discovery Problems
Discovery efforts are set in motion by Insight Consumers, not Data Scientists.
The success of efforts is gauged by Insight consumers. Insights are used by
the originating Insight Consumers, not other analysts, and rarely other Insight
Consumers.
!
Multiple hypotheses are often explored in parallel, supported by multiple data
sets / interim data products.
!
Useful reconstructing of analytical workflows requires linear history of all steps
/ activities.
Discovery Problems
Data science resources - Individuals, projects, and teams - are always aligned
to business areas or strategic goals: e.g. the Content Insights team at
LinkedIn supports analytical goals related to LinkedIn's major push to enhance
its media presence and role in media.
!
At large scales of group, this inverts - for example within a company,
communities of practice are aligned to a discipline, and will include members
who's activities span the needs of all the business units.
!
No analytical efforts begin completely open-ended, with no idea of the nature
or import of resulting insights.
!
There is almost always a hypothesis, or more than one. (Even in more
academic / research oriented settings, there is no basic research - all
investigations are purposive and grounded in defined business intent.
PROBLEM NATURE
• Well-defined
• Explicit form: Why, What, and How questions
• Implicit form: which question
• Hypothesis are driven by domain knowledge or work experience
• Not very different from the problems business analysts address
!
vBusinesses address the same problems they have been working on, which are
determined in the very beginning before resources should be allocated. Data scientists
do not necessarily contribute to initiating new problems.
Data Science
Insight
Model
Insight
Model
Data Product
Product
Analysts
Outcomes
Skills Portfolio
Data scientists use three kinds of languages: analysis (R- Matlab), scripting
(python, perl), data processing (sql, pig)
!
Analytical environments should allow integration of languages / capabilities they
offer.
!
Every analyst has their preferred language / method - defaults to using their
own for analytical efforts. True within centralized analytical teams.
Skills
Discovery Maturity
• Discovery is poorly understood and little recognized as a capability. It is rarely
mentioned by any of the Data Science / Analytics professionals spoken with.
When mentioned, it is seen as a small-scale activity and / or a desired
outcome of particular projects, not something the organization needs to be
able to in an ongoing / comprehensive / large-scale fashion such as
understanding customers.
!
• Data scientists understand their own challenges in terms of what stages /
aspects of a data-centric workflow require greatest time, effort, or present most
complexity or potential for introducing uncertainty / ambiguity into the efforts.
Broader framings are the need for or desire to work on data-driven products,
or transform and improve business through offering data-centered insights.
!
• Product-centric data scientists (aim directly at making data-driven offerings)
are a small minority of the active community. Many more are engineers with
strong data skills, and many more analysts trying to acquire data science skills
/ perspective.
Supporting Factors
• Regardless of particulars, the core ingredients remain the same: analytical
skills and perspective, domain knowledge, engineering / tooling skills and
perspective
!
• In data science practices, analysis is always enabled by engineering - either
localized to the data science team, or centrally provided via IT.
!
• In BI practices, analysis is always enabled by IT and systems consultants /
integrators (in house or external).
!
• Leading DS groups rely on a number of hybrid approaches to support data
cleansing and the evaluation of models, insights, and results - e.g. crowd
source prep of data and checking of results for prototypes and experiments.
!
• Data scientists rarely productionize code, analytical workflows, analytical tools.
Engineers / IT convert 'prototype' artifacts created by data scientists into
production code / tools.
Perspective
Analytical
The analytical perspective is the center of definition for all analytical roles.
Contrast with engineers, who "make stuff". Analytical roles figure things out
for some purpose: whether a model to inform a product prototype or provide
insight.
!
Empirical
The empirical perspective is distinct from the analytical perspective, and
marks 'true' data scientists. This revolves around framing and testing
hypotheses formally and informally, often requires validation and interrogation
of experimental methods and results by others, expects significant degree of
transparency at (all) stages of the analytical effort.
Cooperation and Collaboration
• Discovery efforts are structured as individual efforts - insights come from
individual analytical engagement with data sets.
!
• Collaboration between analysts is asynchronous.
!
• Diversity of analytical tools / languages in practice = barrier to cooperation and
collaboration.
!
• There is little re-use of analytical insights by analysts to further other efforts.
!
• When tools and/or problem domains are stable / known, analysts create
individual and group assets for reuse - e.g. R script libraries, code snippets for
SAS, templates for data set file formats and structures
!
• Intermediate work products created during analytical work (data sets / subsets,
code, analytical scripts, algorithms, interim results, hypotheses,) perceived as
often irrelevant or throwaway, if not outright wrong. Little investment is made
to annotate / preserve intermediate work products for individual or group re-
use, sharing, review.
THE MANY SHADES OF COLLABORATION
Independent: Have-it-all type data scientist (I know, I design & I implement)
Linear: Complementary (Analysts know, data scientists design, engineers implement)
Project-based: The missing piece ( Data scientists lead or support engineers)
Consultancy: From abstract to concrete (Some data scientists know & design, some other
data scientists implement)
Data Landscape
• The physical location of data - where stored / what environment - is a
significant cost factor for almost all aspects of analytical work.
!
• Distributed data (managed / located in multiple stores) increases costs for
many individual steps in analytical workflows.
!
• Distributed data costs often = barrier to conducting insightful analysis using
multiple techniques / steps. Default to basic / simple analysis to avoid high
effort / low probability of success.
!
• For analysts with low levels of db / data wrangling skill, even marginal
distributed data costs = preventative barrier for engaging with data.
!
• Most analysts reported having to migrate all of the data sets into the same
data processing framework to begin analysis. [If all the data were in one
place...]
DATA NATURE
• Messy: various forms (Web logs, web pages, genome data, sales revenues….)
• Scattered: Data scientists have to search from the wild (outside of enterprise
databases)
• Started “Big”, ended “Lean”: Meaningful data units are small in size
• Standardization is key to all data science work: why engineers become data scientists
!
v Data scientists are “data foragers“ and “data format equalizers”. They have the ability
to manipulate large data sets and gradually narrow the data sets down to the exact
units needed for analysis.
Algorithms and Analytical Tools
• Well-known algorithms and methods are used to plan and structure
experiments, discover insights, drive the creation of new models, evaluate the
effectiveness of new models & products.
!
• The algorithm and method are often determined by domain, such as TF-IDF
for IR, Smith-Waterman for bioinformatics,
PROCESS NATURE
• Wicked: Solutions are often times hardly pre-defined
• Iterative three-step cycle: Data collection, data cleansing, & data analysis
• trial-and-error: Hypotheses revision, hypotheses validation, & data recollection
• Ad-hoc analysis chance encountering
!
v Data scientists provide new perspectives to address old problems. The path to the
solution is usually exploratory. But the goal has always been clear and pre-defined.
Data Science Workflows
http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
Data Science Workflows
Data Science Workflow
• Frame problem / goal of effort
• Identify and extract data to be used in effort from whole corpus / totality of
available data
• Exploratory identification and selection of working data for use in
experiments
• Define experiment(s): hypothesis / null hypothesis, methods, success criteria
• Derive insight(s)
• Wrangle, process, visualize, interpret
• Codify / create new model reflecting insights outcomes from experiments
• Validate new model(s)
• Provision training data
• Train new model
• Validation and outcome of training model
• Hand-off for implementation on production systems / as production code
Analysis Workflow & Activities
• Empirical analysis of subsets of data
• Understand topology of data, boundaries (sets / subsets, complete corpus,
totality of data)
• Outlier identification and profiling
• How significant are outliers to overall topology
• Comparative exclusion and profiling of resulting data subsets to understand their role,
discover principal components
• Find and analyze patterns, areas of interestingness / deserving attention
• Find and analyze central actors / factors (in existing model that produced
source data, in topology of working data, in patterns, etc.)
• ID and understand their impact on local and global data topology and primary metrics if in
several ways / more than one axis / at the same time
• Discover and analyze relationships amongst central actors
• Understand cycles, trends, changes (dynamic characteristics) for core
actors, topology, patterns and structure
• Understand causal factors
• Codify / create new model reflecting insights & outcomes from experiments
• dynamic working data sets & subset
• iterative
• experimental frame
Key Workflows
Insight Consumer <> Data Scientist
originate, define, address discovery effort
!
Data Scientist > Data Engineer
create & evolve apps to address new & in-progress efforts
!
Analyst <> Analyst
define & address in-progress discovery efforts
!
Data Scientist > internal networks
create & curate archive & community
Needs
What are the most common and useful statistical techniques you use during
discovery and analysis efforts?
!
What statistical capabilities or functions would be very useful if provided within
discovery applications, and where would they be useful?
“(1)	
  The	
  most	
  commonly	
  used	
  sta0s0cal	
  techniques	
  used	
  to	
  date	
  (in	
  our	
  strategic	
  
planning	
  work)	
  are:	
  	
  dimensionality	
  reduc0on	
  (par00on	
  clustering,	
  mul0ple	
  
correspondence	
  analysis),	
  factor	
  analysis,	
  par00on	
  clustering	
  (k-­‐means,	
  k-­‐medoids,	
  
fuzzy	
  clustering),	
  cluster	
  valida0on	
  techniques	
  (silhoueMe,	
  dunn’s	
  index,	
  connec0vity),	
  
mul0variate	
  outlier	
  detec0on,	
  linear	
  regression,	
  and	
  logis0c	
  regression.”
!
(2)	
  Techniques	
  that	
  would	
  assist	
  with	
  iden0fying	
  outliers	
  or	
  invalid	
  data.	
  	
  Much	
  of	
  this	
  
work	
  seems	
  to	
  be	
  done	
  by	
  hand.	
  	
  I	
  believe	
  that	
  we	
  are	
  also	
  geOng	
  to	
  the	
  point	
  where	
  
we	
  could	
  start	
  using	
  linear	
  regression	
  and	
  splines	
  (for	
  showing	
  trends).”
Needs
For example, would system-generated descriptive statistical visualizations be
useful for whole data sets - or for smaller user-selected groups of attributes?
!
Would it be useful for the application to analyze and suggest possible
distribution models it sees in the data; for the values of individual attributes,
and/or for larger sets of data?
“With	
  regards	
  to	
  your	
  last	
  ques0on	
  on	
  visualiza0on,	
  we	
  have	
  put	
  in	
  significant	
  effort	
  to	
  
use	
  visualiza0on	
  in	
  our	
  Endeca	
  installa0on.	
  	
  We	
  have	
  built	
  visualiza0ons	
  such	
  as	
  tree	
  
maps,	
  flow	
  diagrams,	
  sun	
  burst	
  diagrams,	
  scaMer	
  plots	
  showing	
  clusters,	
  and	
  
hierarchical	
  edge	
  bundling	
  diagrams	
  to	
  explore	
  our	
  data	
  sets.	
  	
  	
  
!
Our	
  data	
  tends	
  to	
  be	
  qualita0ve	
  rather	
  than	
  quan0ta0ve	
  so	
  this	
  drives	
  much	
  of	
  our	
  
visualiza0ons.
!
So	
  yes,	
  interac0ve	
  descrip0ve	
  sta0s0cal	
  visualiza0on	
  would	
  be	
  helpful	
  –	
  on	
  the	
  
complete	
  data	
  set	
  and	
  individual	
  aMributes.”
Needs
1. What are the most common statistical techniques you use at work -
descriptive, inferential, or otherwise? What are the most valuable?
!
2. What are the most common visualizations you use to present findings or
share insights? What are the most valuable?
“(1) We do a lot of chi-square tests, permutation tests, false discovery rate correction, Bonferroni
correction, 2x2 Fisher exact test, logistic regression.  !
!
I also use SVM, Artificial Neural Networks (ANN), Naive-Bayes Classifiers (NBC), parts of speech
taggers.”!
!
(2) ROC curves, tables with p-values or odds ratios or hazard ratio (http://en.wikipedia.org/wiki/
Hazard_ratio)!
!
Things  p-value!
XYZ1    0.001!
XYZ2 ...!
etc.”
Needs
1. What are the most common statistical techniques you use at work -
descriptive, inferential, or otherwise? What are the most valuable?
!
2. What are the most common visualizations you use to present findings or
share insights? What are the most valuable?
!
“Logistic Regression, Decision Trees, Markov Models, Area Under Curve”
Casual
Analyst
Analytical
Manager
Data Skills
Level
Customize
Models
Low / none
High
Composition CapabilityLow / Use High / Make
Create New
Models
Create Complex
Models
Analyst
Sense Makers: Information Management Ability
Use
Models
Problem Solver
Data Scientist
Materials
• http://www.datasciencecentral.com/
• Ben Lorica’s blog: http://strata.oreilly.com/ben
• https://blog.twitter.com/tags/twitter-data
• http://www.slideshare.net/s_shah/the-big-data-ecosystem-at-
linkedin-23512853
Algorithms (ex: computational complexity, CS theory)
Back-End Programming (ex: JAVA/Rails/Objective C)
Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS)
Big and Distributed Data (ex: Hadoop, Map/Reduce)
Business (ex: management, business development, budgeting)
Classical Statistics (ex: general linear model, ANOVA)
Data Manipulation (ex: regexes, R, SAS, web scraping)
Front-End Programming (ex: JavaScript, HTML, CSS)
Graphical Models (ex: social networks, Bayes networks)
Machine Learning (ex: decision trees, neural nets, SVM, clustering)
Math (ex: linear algebra, real analysis, calculus)
Optimization (ex: linear, integer, convex, global)
Product Development (ex: design, project management)
Science (ex: experimental design, technical writing/publishing)
Simulation (ex: discrete, agent-based, continuous)
Spatial Statistics (ex: geographic covariates, GIS)
Structured Data (ex: SQL, JSON, XML)
Surveys and Marketing (ex: multinomial modeling)
Systems Administration (ex: *nix, DBA, cloud tech.)
Temporal Statistics (ex: forecasting, time-series analysis)
Unstructured Data (ex: noSQL, text mining)
Visualization (ex: statistical graphics, mapping, web-based dataviz)
Skills
Figure 3-3. There were interesting partial correlations among each
respondent’s primary Skills Group (rows) and primary Self-ID Group!
(columns). The mosaic plot illustrates the proportions of respondents!
who fell into each combination of groups. For example, there were few!
Data Researchers whose top Skill Group was Programming.
Skills

More Related Content

What's hot

Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
Lisa Cohen
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
jcscholtes
 
Building an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics CapabilityBuilding an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics Capability
Jeff Crawford
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
Seth Grimes
 
Introduction to Business Data Analytics
Introduction to Business Data AnalyticsIntroduction to Business Data Analytics
Introduction to Business Data Analytics
VadivelM9
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
diannepatricia
 
Data analytics
Data analyticsData analytics
Data analytics
Dr.Bhuvaneswari Velumani
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
Dr. Haxel Consult
 
Knowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AIKnowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AI
Enterprise Knowledge
 
Graduation Thesis Sample
Graduation Thesis SampleGraduation Thesis Sample
Graduation Thesis Sample
Graduate Thesis
 
Data analytics
Data analyticsData analytics
Data analytics
BindhuBhargaviTalasi
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Summaiya Gauhar
 
How relevant is Predictive Analytics relevant today?
How relevant is Predictive Analytics relevant today?How relevant is Predictive Analytics relevant today?
How relevant is Predictive Analytics relevant today?
Steven Mugerwa
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
Lisa Cohen
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
Robert Smith
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
MeaningCloud
 

What's hot (20)

Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Building an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics CapabilityBuilding an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics Capability
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
 
Introduction to Business Data Analytics
Introduction to Business Data AnalyticsIntroduction to Business Data Analytics
Introduction to Business Data Analytics
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
Data analytics
Data analyticsData analytics
Data analytics
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
 
Knowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AIKnowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AI
 
Graduation Thesis Sample
Graduation Thesis SampleGraduation Thesis Sample
Graduation Thesis Sample
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
How relevant is Predictive Analytics relevant today?
How relevant is Predictive Analytics relevant today?How relevant is Predictive Analytics relevant today?
How relevant is Predictive Analytics relevant today?
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
 

Viewers also liked

Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013
Joe Lamantia
 
Webinar September 2011 2
Webinar September 2011 2Webinar September 2011 2
Webinar September 2011 2
JamesHampton
 
Social Interaction Design For Augmented Reality: Patterns and Principles for ...
Social Interaction Design For Augmented Reality: Patterns and Principles for ...Social Interaction Design For Augmented Reality: Patterns and Principles for ...
Social Interaction Design For Augmented Reality: Patterns and Principles for ...
Joe Lamantia
 
Complex Models for Big Data
Complex Models for Big DataComplex Models for Big Data
Complex Models for Big Data
Data Science Research Center
 
Building new business models through big data dec 06 2012
Building new business models through big data   dec 06 2012Building new business models through big data   dec 06 2012
Building new business models through big data dec 06 2012
Aki Balogh
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
What Is Social Interaction Design?
What Is Social Interaction Design?What Is Social Interaction Design?
What Is Social Interaction Design?
adrian chan
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
Aki Balogh
 
Startup Secrets - Turning Products into Companies
Startup Secrets - Turning Products into CompaniesStartup Secrets - Turning Products into Companies
Startup Secrets - Turning Products into Companies
Michael Skok
 
Realidad Aumentada en Educación: Usos y Posibilidades
Realidad Aumentada en Educación: Usos y PosibilidadesRealidad Aumentada en Educación: Usos y Posibilidades
Realidad Aumentada en Educación: Usos y Posibilidades
Raúl Reinoso
 
Leyes asociativa
Leyes asociativaLeyes asociativa
Leyes asociativa
Miguel Nandayapa
 
Augmented reality The future of computing
Augmented reality The future of computingAugmented reality The future of computing
Augmented reality The future of computing
Abhishek Abhi
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
Deloitte United States
 
Augmented Reality and Education: Learning connected to life - Reloaded
Augmented Reality and Education: Learning connected to life - ReloadedAugmented Reality and Education: Learning connected to life - Reloaded
Augmented Reality and Education: Learning connected to life - Reloaded
Raúl Reinoso
 

Viewers also liked (16)

Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013
 
Webinar September 2011 2
Webinar September 2011 2Webinar September 2011 2
Webinar September 2011 2
 
Social Interaction Design For Augmented Reality: Patterns and Principles for ...
Social Interaction Design For Augmented Reality: Patterns and Principles for ...Social Interaction Design For Augmented Reality: Patterns and Principles for ...
Social Interaction Design For Augmented Reality: Patterns and Principles for ...
 
Complex Models for Big Data
Complex Models for Big DataComplex Models for Big Data
Complex Models for Big Data
 
Building new business models through big data dec 06 2012
Building new business models through big data   dec 06 2012Building new business models through big data   dec 06 2012
Building new business models through big data dec 06 2012
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
What Is Social Interaction Design?
What Is Social Interaction Design?What Is Social Interaction Design?
What Is Social Interaction Design?
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
 
Startup Secrets - Turning Products into Companies
Startup Secrets - Turning Products into CompaniesStartup Secrets - Turning Products into Companies
Startup Secrets - Turning Products into Companies
 
Realidad Aumentada en Educación: Usos y Posibilidades
Realidad Aumentada en Educación: Usos y PosibilidadesRealidad Aumentada en Educación: Usos y Posibilidades
Realidad Aumentada en Educación: Usos y Posibilidades
 
Leyes asociativa
Leyes asociativaLeyes asociativa
Leyes asociativa
 
Augmented reality The future of computing
Augmented reality The future of computingAugmented reality The future of computing
Augmented reality The future of computing
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
Augmented Reality and Education: Learning connected to life - Reloaded
Augmented Reality and Education: Learning connected to life - ReloadedAugmented Reality and Education: Learning connected to life - Reloaded
Augmented Reality and Education: Learning connected to life - Reloaded
 

Similar to Data Science Highlights

Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
Joe Lamantia
 
UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
UX STRAT USA Presentation: Joe Lamantia, Bottomline TechnologiesUX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
UX STRAT
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Simplilearn
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
EMC
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Julia Grosman
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
NagarajanG35
 
Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023
Careervira
 
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
Soraya Hasbani (miller)
 
Design and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner ViewDesign and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner View
Julian Jordan
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
Juuso Parkkinen
 
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science Resume
Arushi Prakash, Ph.D.
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
Brian Spiering
 
Top 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdfTop 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdf
Data Science Council of America
 
Building an AI organisation
Building an AI organisationBuilding an AI organisation
Building an AI organisation
Vikash Mishra
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward Chenard
Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
amdia
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 

Similar to Data Science Highlights (20)

Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centere...
 
UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
UX STRAT USA Presentation: Joe Lamantia, Bottomline TechnologiesUX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
 
Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023
 
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
Anova Analytics - Advanced Analytics - Marketing & Sales Brochure .V5
 
Design and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner ViewDesign and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner View
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science Resume
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
 
Top 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdfTop 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdf
 
Building an AI organisation
Building an AI organisationBuilding an AI organisation
Building an AI organisation
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 

More from Joe Lamantia

Big Data Is Not the Insight: The Language Of Discovery:
Big Data Is Not the Insight: The Language Of Discovery: Big Data Is Not the Insight: The Language Of Discovery:
Big Data Is Not the Insight: The Language Of Discovery:
Joe Lamantia
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
The Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data InteractionsThe Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data Interactions
Joe Lamantia
 
User Experience Architecture For Discovery Applications
User Experience Architecture For Discovery ApplicationsUser Experience Architecture For Discovery Applications
User Experience Architecture For Discovery Applications
Joe Lamantia
 
Understanding Frameworks: Beyond Findability IA Summit 2010
Understanding Frameworks: Beyond Findability IA Summit 2010Understanding Frameworks: Beyond Findability IA Summit 2010
Understanding Frameworks: Beyond Findability IA Summit 2010
Joe Lamantia
 
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
Joe Lamantia
 
Personal Finance On-line: New Models & Opportunities
Personal Finance On-line: New Models & OpportunitiesPersonal Finance On-line: New Models & Opportunities
Personal Finance On-line: New Models & Opportunities
Joe Lamantia
 
Designing Goal-based Experiences
Designing Goal-based ExperiencesDesigning Goal-based Experiences
Designing Goal-based Experiences
Joe Lamantia
 
Social Media: Strategic Overview & Business Implications
Social Media: Strategic Overview & Business ImplicationsSocial Media: Strategic Overview & Business Implications
Social Media: Strategic Overview & Business Implications
Joe Lamantia
 
Digital Music Services (Strategic Review & Options)
Digital Music Services (Strategic Review & Options)Digital Music Services (Strategic Review & Options)
Digital Music Services (Strategic Review & Options)
Joe Lamantia
 
Designing Frameworks For Interaction and User Experience
Designing Frameworks For Interaction and User Experience Designing Frameworks For Interaction and User Experience
Designing Frameworks For Interaction and User Experience
Joe Lamantia
 
Massively Social Games: Next Generation Experiences
Massively Social Games: Next Generation ExperiencesMassively Social Games: Next Generation Experiences
Massively Social Games: Next Generation Experiences
Joe Lamantia
 
Waves of Change Shaping Digital Experiences
Waves of Change Shaping Digital ExperiencesWaves of Change Shaping Digital Experiences
Waves of Change Shaping Digital Experiences
Joe Lamantia
 
Frameworks Are The Future of Design
Frameworks  Are The Future of DesignFrameworks  Are The Future of Design
Frameworks Are The Future of Design
Joe Lamantia
 
Effective IA For Portals: The Building Blocks Framework
Effective IA For Portals: The Building Blocks FrameworkEffective IA For Portals: The Building Blocks Framework
Effective IA For Portals: The Building Blocks Framework
Joe Lamantia
 
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
Joe Lamantia
 
The DIY Future: What Happens When Everyone Is a Designer
The DIY Future: What Happens When Everyone Is a DesignerThe DIY Future: What Happens When Everyone Is a Designer
The DIY Future: What Happens When Everyone Is a Designer
Joe Lamantia
 
Designing Ethically - EuroIA 2007 Ethics Panel Presentation
Designing Ethically - EuroIA 2007 Ethics Panel PresentationDesigning Ethically - EuroIA 2007 Ethics Panel Presentation
Designing Ethically - EuroIA 2007 Ethics Panel Presentation
Joe Lamantia
 
It Seemed Like The Thing To Do At Time: State of Mind and Failure
It Seemed Like The Thing To Do At Time: State of Mind and FailureIt Seemed Like The Thing To Do At Time: State of Mind and Failure
It Seemed Like The Thing To Do At Time: State of Mind and Failure
Joe Lamantia
 

More from Joe Lamantia (20)

Big Data Is Not the Insight: The Language Of Discovery:
Big Data Is Not the Insight: The Language Of Discovery: Big Data Is Not the Insight: The Language Of Discovery:
Big Data Is Not the Insight: The Language Of Discovery:
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
The Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data InteractionsThe Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data Interactions
 
User Experience Architecture For Discovery Applications
User Experience Architecture For Discovery ApplicationsUser Experience Architecture For Discovery Applications
User Experience Architecture For Discovery Applications
 
Understanding Frameworks: Beyond Findability IA Summit 2010
Understanding Frameworks: Beyond Findability IA Summit 2010Understanding Frameworks: Beyond Findability IA Summit 2010
Understanding Frameworks: Beyond Findability IA Summit 2010
 
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
Design Principles for Social Augmented Experiences: Next Wave of AR Panel | W...
 
Personal Finance On-line: New Models & Opportunities
Personal Finance On-line: New Models & OpportunitiesPersonal Finance On-line: New Models & Opportunities
Personal Finance On-line: New Models & Opportunities
 
Designing Goal-based Experiences
Designing Goal-based ExperiencesDesigning Goal-based Experiences
Designing Goal-based Experiences
 
Social Media: Strategic Overview & Business Implications
Social Media: Strategic Overview & Business ImplicationsSocial Media: Strategic Overview & Business Implications
Social Media: Strategic Overview & Business Implications
 
Digital Music Services (Strategic Review & Options)
Digital Music Services (Strategic Review & Options)Digital Music Services (Strategic Review & Options)
Digital Music Services (Strategic Review & Options)
 
Designing Frameworks For Interaction and User Experience
Designing Frameworks For Interaction and User Experience Designing Frameworks For Interaction and User Experience
Designing Frameworks For Interaction and User Experience
 
Massively Social Games: Next Generation Experiences
Massively Social Games: Next Generation ExperiencesMassively Social Games: Next Generation Experiences
Massively Social Games: Next Generation Experiences
 
Waves of Change Shaping Digital Experiences
Waves of Change Shaping Digital ExperiencesWaves of Change Shaping Digital Experiences
Waves of Change Shaping Digital Experiences
 
Frameworks Are The Future of Design
Frameworks  Are The Future of DesignFrameworks  Are The Future of Design
Frameworks Are The Future of Design
 
Effective IA For Portals: The Building Blocks Framework
Effective IA For Portals: The Building Blocks FrameworkEffective IA For Portals: The Building Blocks Framework
Effective IA For Portals: The Building Blocks Framework
 
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
When Everyone Is A Designer: Practical Techniques for Ethical Design in the D...
 
The DIY Future: What Happens When Everyone Is a Designer
The DIY Future: What Happens When Everyone Is a DesignerThe DIY Future: What Happens When Everyone Is a Designer
The DIY Future: What Happens When Everyone Is a Designer
 
Designing Ethically - EuroIA 2007 Ethics Panel Presentation
Designing Ethically - EuroIA 2007 Ethics Panel PresentationDesigning Ethically - EuroIA 2007 Ethics Panel Presentation
Designing Ethically - EuroIA 2007 Ethics Panel Presentation
 
It Seemed Like The Thing To Do At Time: State of Mind and Failure
It Seemed Like The Thing To Do At Time: State of Mind and FailureIt Seemed Like The Thing To Do At Time: State of Mind and Failure
It Seemed Like The Thing To Do At Time: State of Mind and Failure
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Data Science Highlights

  • 2.
  • 3. Data Scientist Square - San Francisco Bay Area Job Description Square is hiring a Data Scientist on our Risk team. The Risk team at Square is responsible for enabling growth while mitigating financial loss associated with transactions. We work closely with our Product and Growth teams to craft a fantastic experience for our buyers and sellers. ! Desired Skills & Experience As a Data Scientist on our Risk team, you will use machine learning and data mining techniques to assess and mitigate the risk of every entity and event in our network. You will sift through a growing stream of payments, settlements, and customer activities to identify suspicious behavior with high precision and recall. You will explore and understand our customer base deeply, become an expert in Risk, and contribute to a world-class underwriting system that helps Square provide delightful service to both buyers and sellers.
 
 To accomplish this, you are comfortable writing production code in Java and conducting exploratory data analysis in R and Python. You can take statistical and engineering ideas from prototype to production. You excel in a small team setting and you apply expert knowledge in engineering and statistics.
 
 Responsibilities 1. Investigate, prototype and productionize features and machine learning models to identify good and bad behavior. 2. Design, build, and maintain robust production machine learning systems. 3. Create visualizations that enable rapid detection of suspicious activity in our user base. 4. Become a domain expert in Risk. 5. Participate in the engineering life-cycle. 6. Work closely with analysts and engineers. ! Requirements 1. Ability to find a needle in the haystack. With data. 2. Extensive programming experience in Java and Python or R. 3. Knowledge of one or more of the following: classification techniques in machine learning, data mining, applied statistics, data visualization. 4. Concise verbal and written articulation of complex ideas. ! Even Better 1. Contagious passion for Square’s mission. 2. Data mining or machine learning competition experience. ! Company Description Square is a revolutionary service that enables anyone to accept credit cards anywhere. Square offers an easy to use, free credit card reader that plugs into a phone or iPad. It's simple to sign up. There is no extra equipment, complicated contracts, monthly fees or merchant account required.
 
 Co-founded by Jim McKelvey and Jack Dorsey in 2009, the company is headquartered in San Francisco.
  • 4. Sense Maker Segment Sense makers need to create and/or employ insights to accomplish their business goals and satisfy their responsibilities. ! These insights emerge from independent and collaborative discovery efforts that involve direct interaction with discovery applications, and participation in discovery environments. Insight Consumer ! Analyst Casual Analyst Data Scientist Analytics Manager ! Problem Solver
  • 6. Data Scientist Data Scientist / Senior Research Scientist Data Scientists work with other members of the Data science team, using emerging methods and tools to engage with ‘Big Data’ from a variety of external and internal sources. Data Scientists aim to generate actionable insights that transform the organization; enhance existing products, services and operations; and identify, define and prototype new data-driven products, services, and offerings. They have advanced analytical skills and/or a specialized educational background, and rely on open-source and custom- created tools, to address the ad-hoc and open-horizon questions the Data Science team takes on. Data Scientists collaborate with Insight Consumers, evolving and publishing insights and prototypes of new offerings. Business Goals & Work Setting • Create new data-driven products, services, business opportunities • Transform the business with insights derived from Big Data • Create effective tools and infrastructure for the data science group and other analytical groups within the organization • Develop prototypes based on proprietary or open source tools • Prototype new ways to visualize and understand data relationships • May work within a business unit, providing analytical capability to that unit only, or a centralized Data Science group ! Discovery Needs • Solves complex, critical problems & significant and unique issues. • Have numerous and dynamic ill-formed questions with unpredictable needs for data, visualization, discovery capabilities ! Discovery Tools • Open source tools and platforms for big data, ETL, visualization, analysis, statistics: Hadoop, Cassandra, Kafka, Voldemorte, • Open source algorithms languages: R, HIVE, PIG, • Custom-developed analytical tools Engagement w/ Discovery Applications • Creates custom discovery applications to suit their own needs • Application lifecycle involvement: rolls their own from scratch, iterates and then publishes to wider audiences / productizes • Original author of all discovery solution elements: data / data sets, information models, discovery applications and workspaces • Shares / publishes insights to decision-making groups & social forums in the business ! Collaboration • Works with Engineers and Software Architects to create prototypes and products • Collaborates with Data Scientists on ill-formed questions ! Skills & Expertise • Data management, analytics modeling and business analysis • Prototyping / software engineering • Discovery: advanced statistics, quantitative and qualitative analysis, machine learning, data mining, natural language processing, computational linguistics, broad knowledge of applied mathematics, statistical methods and algorithms
  • 7. Profiles & Discovery Problem Spectrum D ata Scientist Analyst(all) C asualAnalyst Problem Solver Ill-formed Well-formed
  • 9.
  • 12. What sort of animal? They seem different than analysts: • problem set • relationship to discovery tools • skills and professional profile • discovery / analytical methods • perspective • workflow and collaboration ! Are they? How?
  • 13. Areas of Investigation • Workflow • Environment • Organizational model • Pain points • Tools • Data landscape • Analytical practices • Project structure • Unmet needs
  • 14.
  • 16. Discussion Guide Can you please walk me through a recent or current project? a. How was the project initiated? b. How defined was the business problem in the beginning? Did the problem change? c. Where/who did you obtain data sets from? How did you make the decision? d.Describe the data you used: How did the data sets look like? How big were they? Were they structured or unstructured? e. What tools or techniques did you use to do the analyses? Did they map to the specific steps you mentioned just now? f. How did you decide these were the tools/techniques to use? To what extent were these decisions made by yourself and to what extent were they standardized by your group/team? g. How did you present the results of your analyses? What tools did you use? What do you like and dislike about your current tool set? h. Which stage of this project was the most challenging? To what extent did the tools satisfy what you intended to do? What features were lacking? i. How much collaboration was there during each stage of the project? i. Background and role of collaborators ii. Collaboration modes iii. Types of information shared ! Thinking about the projects you have worked on, is there a common approach you take to address these problems? How did you decide on this approach/tools? !
  • 21.
  • 22. Creates  data-­‐driven  insights,  offerings,  and  resources  to  transform  the  organiza7on Work  Experience    10  Years   Educa0on  Ph.D.  Sta7s7cs,  MS  Bio-­‐Informa7cs Job  Title    Senior  Data  Scien7st   Company    LInkedIn Summarize  &  Communicate   ! Review  findings  with  colleagues;   summarize  ,visualize,  and   communicate  key  findings  to   Insight  Consumers/decision   makers Prototype  &  Experiment   with  data  driven  feature:   ! How  can  we    prototype/ evaluate  this  w/out   disrup0ng  the  site? Gather  Data  &   Analyze  Results   ! Use  descrip0ve,   inferen0al,  and   predic0ve  sta0s0cs   to  evaluate    results Analyze  &  Iden7fy  causal/ predic7ve  factors:   Who  are  the  best   candidates  to  contact  for  a   job  based  on  recruiter   needs  and  profile  content? Dana  Data  Scien0st   • Defining  and  capturing  useful  measures  of   online  aMen0on   • GeOng  all  the  data  analy0c  tools  to  work   together  properly     • No  current  workflow  support  or  tools  for  data   wrangling,  analysis,    experimenta0on,,  and   prototyping • Effec0ve  tools  to  help  experiment  with  and   evaluate  value  /u0lity  of  features  and   ac0vi0es  for  users   • Ability  to  rapidly  prototype  data-­‐driven   features  w/out  risk  of  online  service   disrup0ons • Open  source  data  manipula0on,  mining  &   analysis  tools  including  R,  Pig,  Hadoop,  Python,   etc.     • Sta0s0cal  packages  such  as  SAS,  SPSS,  etc.   • Custom  analy0cal  tools  built  using  open  source   components  and  languages • Leverage  data  to  support  the  org  mission   • Enhance  products  &  services  with  data-­‐driven   insights  and  features   • Use  data  to  iden0fy  new    opportuni0es  and   prototype/drive  new  customer  offerings   • Create  useful  data  sets/streams,  measures,  &   resources  (e.g.,  data  models,  algorithms,  etc. Key  Goals Tools Pain  Points Wish  List Sample  Workflow Dana  is  a  Senior  Data  Scien0st  who  has  worked  at  LinkedIn  for  5  years.    Dana’s   educa0on  includes  a  Ph.D.  in  Sta0s0cs  and  an  MS  in  Bio  Informa0cs.    Dana’s   previous  work  includes  posi0ons  in  academic  research  groups  as  a  doctoral   candidate  and  post-­‐doc,  as  well  as  so_ware  engineering  roles  in  the  Internet  &   technology  industries. • Dana  works  with  several  other  data  scien0sts  and  her  Analy0cs  Manager  on   a  centralized  team   • Dana  and  her  colleagues  aim  to  create  data  driven  insights,  features,   resources,  and  offerings  that  deliver  strategic  value  to  LinkedIn   • Dana  works  with  Analysts  on  other  teams  to  define  and  create  discovery   tools,  data  sets,  and  methods  for  use  by  their  groups  at  LinkedIn.   • Dana  &  team  are  visible  &  well  established  within  LinkedIn,  and  have  a  voice   in  product  strategy  and  opera0onal  context;  they  have  a  high  degree  of   autonomy  in  defining  data  science  projects   • Dana  works  with  Insight  Consumers  to  suggest  and  determine  poten0al  new   data  driven  offerings  to  prototype  and  evaluate. • How  can  we  leverage  data  to  increase  online  engagement  with  LinkedIn?     • How  should  we  measure  engagement  &  what  factors  drive  it?   • What  aspects  of  a  personal  profile  are  most  likely  to  encourage  /   discourage  new  connec0ons  between  people?   • How  can  we  increase  people’s  ac0vity  and  contribu0ons  to  topical     discussion  groups?   • What  factors  drive  the  effec0veness  of  our  marke0ng  campaigns?     • Why  did  one  of  our  marke0ng  campaigns  work  excep0onally  well?   • How  can  leverage  data  to  help  recruiters  iden0fy  and  communicate   effec0vely    with  qualified  and  poten0ally  available  candidates? Typical  Discovery  Scenarios  &  Problems Background Work  Context • Mines,  analyzes,  &  experiments  with  data  to   iden0fy  paMerns,  trends,  outliers,  causal   factors,  predic0ve  models,  &  opportuni0es   • Defines  and  explains  newly  devised   measurements,  predic0ve  models,  &   insights   • Compares  effec0veness  of  opera0ons  at   achieving  company  goals  for  engagement,   growth,  data  quality   • Produces  &  explores  new  data  sets   • Collaborates  with  other  data  scien0sts  to   capture  new  data  streams   • Prototypes  new  data  driven  site  features/ offerings   • Runs  data  based  experiments  to  test/ evaluate  models,  hypotheses  &  prototypes   • Communicates  &  explains  analyses  to   colleagues  &  Insight  Consumers I’ll  do  whatever  it  takes  –  wrangle,   extract,  manipulate,  analyze,   experiment,  prototype  –  to  use  data   to  drive  value  &  innovate “     ” Ac7vi7es
  • 27. Business Analytics Data Science Intuitive Manual Gradual Individual Empirical Augmented Accelerated Cooperative* Nature of sense making activity
  • 28. The Essence • Empirical perspective • Business imperatives drive activities • Analytical approach • Recipe is always the same • Engineering always present • Data challenges are paramount • consume 60% - 80% of time and effort • Data volumes range huge to moderate (PB > MB) • Domain often drives analysis • Data scientists already have self-service • Some new problems, many the same • Use ‘advanced’ analytics, not conventional BA • Innovate by applying known analyses to new data • Current workflow fragmented across tools and data stores • Success can be a model, product, insight, infrastructure, tool
  • 29. State of the Discipline A small set of formally constituted Data Science teams at major Internet and technology companies (Facebook, Google, MicroSoft, Yahoo, Twitter, LinkedIn, eBay, Amazon) lead the field in most identifiable respects: • maturity of practice - sophistication of methods, quality of infrastructure • history and tenure as formal function / group • business integration and impact • internal and public visibility • pace of innovation in methods, tools, architecture • quality and rate of contributions to open source and other tools / infrastructure • role in the industry and public discourse on data science: visibility in community, publication of experiments and findings, etc.
  • 30. Tooling & Infrastructure Leading shops have their own comprehensive and often home-built / heavily customized data science environments, tools, infrastructure. ! This infrastructure is aligned to the particulars of their domain and business. Their data science environments are sometimes considerably more 'mature' than those of other shops. ! The large majority of existing data science teams and practices are 'followers' of these leaders, in the sense that while they have idiosyncratic problems and varying domains to address, they rely on innovation from the DS leaders to guide the evolution of their data science practices. ! Their environments reflect a mix of some purpose-built data science components, and infrastructure extended / adapted from business analytic needs such as BI.
  • 31. Tooling & Infrastructure Many organizations are establishing new data science capabilities. A minority of these create new data science teams / practices from scratch without building out other conventional analytical capabilities such as BI. They will need new environments to support data science activities, and may leapfrog older generations of analytic environment, following leaders by directly creating new 'stacks' oriented more specifically for data science. ! The majority of organizations are creating new data science capabilities by building on existing analytical groups and functions. In terms of environments and infrastructure, these organizations have existing analytical environments aligned to BI and other business analytic functions, not specifically adapted to data science needs. Cumulative investment in these environments can be very high. ! New teams will need new tools. Existing teams will need new tools to support new discovery activities ! Berkeley Data Analytics Stack is the most visible open source 'platform' at the moment. No interview participants mentioned it.
  • 32. Organizational Model Data science capability = provisioned via standard org models (ranging across in house, external, centralized, embedded, etc.). ! The ways data science teams and practice groups are managed and their relationship to the orgs they are part of seems to be conventional / familiar. ! We can summarize the landscape of organizational models for providing data science capability by plotting the size of data science team / pool of resources vs. the 'distance' from the problem / need. ! Landscape reflects common patterns for specialized expertise. ! This could shift over time as discovery maturity increases overall first within the analytics industry, then within the general business realm.
  • 33. Discovery Problems Discovery efforts are set in motion by Insight Consumers, not Data Scientists. The success of efforts is gauged by Insight consumers. Insights are used by the originating Insight Consumers, not other analysts, and rarely other Insight Consumers. ! Multiple hypotheses are often explored in parallel, supported by multiple data sets / interim data products. ! Useful reconstructing of analytical workflows requires linear history of all steps / activities.
  • 34. Discovery Problems Data science resources - Individuals, projects, and teams - are always aligned to business areas or strategic goals: e.g. the Content Insights team at LinkedIn supports analytical goals related to LinkedIn's major push to enhance its media presence and role in media. ! At large scales of group, this inverts - for example within a company, communities of practice are aligned to a discipline, and will include members who's activities span the needs of all the business units. ! No analytical efforts begin completely open-ended, with no idea of the nature or import of resulting insights. ! There is almost always a hypothesis, or more than one. (Even in more academic / research oriented settings, there is no basic research - all investigations are purposive and grounded in defined business intent.
  • 35. PROBLEM NATURE • Well-defined • Explicit form: Why, What, and How questions • Implicit form: which question • Hypothesis are driven by domain knowledge or work experience • Not very different from the problems business analysts address ! vBusinesses address the same problems they have been working on, which are determined in the very beginning before resources should be allocated. Data scientists do not necessarily contribute to initiating new problems.
  • 37. Skills Portfolio Data scientists use three kinds of languages: analysis (R- Matlab), scripting (python, perl), data processing (sql, pig) ! Analytical environments should allow integration of languages / capabilities they offer. ! Every analyst has their preferred language / method - defaults to using their own for analytical efforts. True within centralized analytical teams.
  • 39. Discovery Maturity • Discovery is poorly understood and little recognized as a capability. It is rarely mentioned by any of the Data Science / Analytics professionals spoken with. When mentioned, it is seen as a small-scale activity and / or a desired outcome of particular projects, not something the organization needs to be able to in an ongoing / comprehensive / large-scale fashion such as understanding customers. ! • Data scientists understand their own challenges in terms of what stages / aspects of a data-centric workflow require greatest time, effort, or present most complexity or potential for introducing uncertainty / ambiguity into the efforts. Broader framings are the need for or desire to work on data-driven products, or transform and improve business through offering data-centered insights. ! • Product-centric data scientists (aim directly at making data-driven offerings) are a small minority of the active community. Many more are engineers with strong data skills, and many more analysts trying to acquire data science skills / perspective.
  • 40. Supporting Factors • Regardless of particulars, the core ingredients remain the same: analytical skills and perspective, domain knowledge, engineering / tooling skills and perspective ! • In data science practices, analysis is always enabled by engineering - either localized to the data science team, or centrally provided via IT. ! • In BI practices, analysis is always enabled by IT and systems consultants / integrators (in house or external). ! • Leading DS groups rely on a number of hybrid approaches to support data cleansing and the evaluation of models, insights, and results - e.g. crowd source prep of data and checking of results for prototypes and experiments. ! • Data scientists rarely productionize code, analytical workflows, analytical tools. Engineers / IT convert 'prototype' artifacts created by data scientists into production code / tools.
  • 41. Perspective Analytical The analytical perspective is the center of definition for all analytical roles. Contrast with engineers, who "make stuff". Analytical roles figure things out for some purpose: whether a model to inform a product prototype or provide insight. ! Empirical The empirical perspective is distinct from the analytical perspective, and marks 'true' data scientists. This revolves around framing and testing hypotheses formally and informally, often requires validation and interrogation of experimental methods and results by others, expects significant degree of transparency at (all) stages of the analytical effort.
  • 42. Cooperation and Collaboration • Discovery efforts are structured as individual efforts - insights come from individual analytical engagement with data sets. ! • Collaboration between analysts is asynchronous. ! • Diversity of analytical tools / languages in practice = barrier to cooperation and collaboration. ! • There is little re-use of analytical insights by analysts to further other efforts. ! • When tools and/or problem domains are stable / known, analysts create individual and group assets for reuse - e.g. R script libraries, code snippets for SAS, templates for data set file formats and structures ! • Intermediate work products created during analytical work (data sets / subsets, code, analytical scripts, algorithms, interim results, hypotheses,) perceived as often irrelevant or throwaway, if not outright wrong. Little investment is made to annotate / preserve intermediate work products for individual or group re- use, sharing, review.
  • 43. THE MANY SHADES OF COLLABORATION Independent: Have-it-all type data scientist (I know, I design & I implement) Linear: Complementary (Analysts know, data scientists design, engineers implement) Project-based: The missing piece ( Data scientists lead or support engineers) Consultancy: From abstract to concrete (Some data scientists know & design, some other data scientists implement)
  • 44. Data Landscape • The physical location of data - where stored / what environment - is a significant cost factor for almost all aspects of analytical work. ! • Distributed data (managed / located in multiple stores) increases costs for many individual steps in analytical workflows. ! • Distributed data costs often = barrier to conducting insightful analysis using multiple techniques / steps. Default to basic / simple analysis to avoid high effort / low probability of success. ! • For analysts with low levels of db / data wrangling skill, even marginal distributed data costs = preventative barrier for engaging with data. ! • Most analysts reported having to migrate all of the data sets into the same data processing framework to begin analysis. [If all the data were in one place...]
  • 45. DATA NATURE • Messy: various forms (Web logs, web pages, genome data, sales revenues….) • Scattered: Data scientists have to search from the wild (outside of enterprise databases) • Started “Big”, ended “Lean”: Meaningful data units are small in size • Standardization is key to all data science work: why engineers become data scientists ! v Data scientists are “data foragers“ and “data format equalizers”. They have the ability to manipulate large data sets and gradually narrow the data sets down to the exact units needed for analysis.
  • 46. Algorithms and Analytical Tools • Well-known algorithms and methods are used to plan and structure experiments, discover insights, drive the creation of new models, evaluate the effectiveness of new models & products. ! • The algorithm and method are often determined by domain, such as TF-IDF for IR, Smith-Waterman for bioinformatics,
  • 47. PROCESS NATURE • Wicked: Solutions are often times hardly pre-defined • Iterative three-step cycle: Data collection, data cleansing, & data analysis • trial-and-error: Hypotheses revision, hypotheses validation, & data recollection • Ad-hoc analysis chance encountering ! v Data scientists provide new perspectives to address old problems. The path to the solution is usually exploratory. But the goal has always been clear and pre-defined.
  • 50. Data Science Workflow • Frame problem / goal of effort • Identify and extract data to be used in effort from whole corpus / totality of available data • Exploratory identification and selection of working data for use in experiments • Define experiment(s): hypothesis / null hypothesis, methods, success criteria • Derive insight(s) • Wrangle, process, visualize, interpret • Codify / create new model reflecting insights outcomes from experiments • Validate new model(s) • Provision training data • Train new model • Validation and outcome of training model • Hand-off for implementation on production systems / as production code
  • 51. Analysis Workflow & Activities • Empirical analysis of subsets of data • Understand topology of data, boundaries (sets / subsets, complete corpus, totality of data) • Outlier identification and profiling • How significant are outliers to overall topology • Comparative exclusion and profiling of resulting data subsets to understand their role, discover principal components • Find and analyze patterns, areas of interestingness / deserving attention • Find and analyze central actors / factors (in existing model that produced source data, in topology of working data, in patterns, etc.) • ID and understand their impact on local and global data topology and primary metrics if in several ways / more than one axis / at the same time • Discover and analyze relationships amongst central actors • Understand cycles, trends, changes (dynamic characteristics) for core actors, topology, patterns and structure • Understand causal factors • Codify / create new model reflecting insights & outcomes from experiments
  • 52. • dynamic working data sets & subset • iterative • experimental frame
  • 53. Key Workflows Insight Consumer <> Data Scientist originate, define, address discovery effort ! Data Scientist > Data Engineer create & evolve apps to address new & in-progress efforts ! Analyst <> Analyst define & address in-progress discovery efforts ! Data Scientist > internal networks create & curate archive & community
  • 54. Needs What are the most common and useful statistical techniques you use during discovery and analysis efforts? ! What statistical capabilities or functions would be very useful if provided within discovery applications, and where would they be useful? “(1)  The  most  commonly  used  sta0s0cal  techniques  used  to  date  (in  our  strategic   planning  work)  are:    dimensionality  reduc0on  (par00on  clustering,  mul0ple   correspondence  analysis),  factor  analysis,  par00on  clustering  (k-­‐means,  k-­‐medoids,   fuzzy  clustering),  cluster  valida0on  techniques  (silhoueMe,  dunn’s  index,  connec0vity),   mul0variate  outlier  detec0on,  linear  regression,  and  logis0c  regression.” ! (2)  Techniques  that  would  assist  with  iden0fying  outliers  or  invalid  data.    Much  of  this   work  seems  to  be  done  by  hand.    I  believe  that  we  are  also  geOng  to  the  point  where   we  could  start  using  linear  regression  and  splines  (for  showing  trends).”
  • 55. Needs For example, would system-generated descriptive statistical visualizations be useful for whole data sets - or for smaller user-selected groups of attributes? ! Would it be useful for the application to analyze and suggest possible distribution models it sees in the data; for the values of individual attributes, and/or for larger sets of data? “With  regards  to  your  last  ques0on  on  visualiza0on,  we  have  put  in  significant  effort  to   use  visualiza0on  in  our  Endeca  installa0on.    We  have  built  visualiza0ons  such  as  tree   maps,  flow  diagrams,  sun  burst  diagrams,  scaMer  plots  showing  clusters,  and   hierarchical  edge  bundling  diagrams  to  explore  our  data  sets.       ! Our  data  tends  to  be  qualita0ve  rather  than  quan0ta0ve  so  this  drives  much  of  our   visualiza0ons. ! So  yes,  interac0ve  descrip0ve  sta0s0cal  visualiza0on  would  be  helpful  –  on  the   complete  data  set  and  individual  aMributes.”
  • 56. Needs 1. What are the most common statistical techniques you use at work - descriptive, inferential, or otherwise? What are the most valuable? ! 2. What are the most common visualizations you use to present findings or share insights? What are the most valuable? “(1) We do a lot of chi-square tests, permutation tests, false discovery rate correction, Bonferroni correction, 2x2 Fisher exact test, logistic regression.  ! ! I also use SVM, Artificial Neural Networks (ANN), Naive-Bayes Classifiers (NBC), parts of speech taggers.”! ! (2) ROC curves, tables with p-values or odds ratios or hazard ratio (http://en.wikipedia.org/wiki/ Hazard_ratio)! ! Things  p-value! XYZ1    0.001! XYZ2 ...! etc.”
  • 57. Needs 1. What are the most common statistical techniques you use at work - descriptive, inferential, or otherwise? What are the most valuable? ! 2. What are the most common visualizations you use to present findings or share insights? What are the most valuable? ! “Logistic Regression, Decision Trees, Markov Models, Area Under Curve”
  • 58. Casual Analyst Analytical Manager Data Skills Level Customize Models Low / none High Composition CapabilityLow / Use High / Make Create New Models Create Complex Models Analyst Sense Makers: Information Management Ability Use Models Problem Solver Data Scientist
  • 59. Materials • http://www.datasciencecentral.com/ • Ben Lorica’s blog: http://strata.oreilly.com/ben • https://blog.twitter.com/tags/twitter-data • http://www.slideshare.net/s_shah/the-big-data-ecosystem-at- linkedin-23512853
  • 60.
  • 61. Algorithms (ex: computational complexity, CS theory) Back-End Programming (ex: JAVA/Rails/Objective C) Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS) Big and Distributed Data (ex: Hadoop, Map/Reduce) Business (ex: management, business development, budgeting) Classical Statistics (ex: general linear model, ANOVA) Data Manipulation (ex: regexes, R, SAS, web scraping) Front-End Programming (ex: JavaScript, HTML, CSS) Graphical Models (ex: social networks, Bayes networks) Machine Learning (ex: decision trees, neural nets, SVM, clustering) Math (ex: linear algebra, real analysis, calculus) Optimization (ex: linear, integer, convex, global) Product Development (ex: design, project management) Science (ex: experimental design, technical writing/publishing) Simulation (ex: discrete, agent-based, continuous) Spatial Statistics (ex: geographic covariates, GIS) Structured Data (ex: SQL, JSON, XML) Surveys and Marketing (ex: multinomial modeling) Systems Administration (ex: *nix, DBA, cloud tech.) Temporal Statistics (ex: forecasting, time-series analysis) Unstructured Data (ex: noSQL, text mining) Visualization (ex: statistical graphics, mapping, web-based dataviz)
  • 63. Figure 3-3. There were interesting partial correlations among each respondent’s primary Skills Group (rows) and primary Self-ID Group! (columns). The mosaic plot illustrates the proportions of respondents! who fell into each combination of groups. For example, there were few! Data Researchers whose top Skill Group was Programming. Skills