BIG DATA | How to explain it & how to use it
for your career?
NetCom Learning
NetCom Learning – Managed Learning
Services
Today’s Agenda
If you ask people what BIG DATA is they often say it is about a lot of
data. But the world has ALWAYS had a lot of data! It is about
datafication – a word so new that even spellcheck functions don’t
know it’s a real word!
Today’s Agenda
 How BIG DATA changes career paths of even the most unsuspecting!
 How BIG DATA changes the way business decision are made.
 How BIG DATA changes who makes the decisions & the reshuffling balance of power.
 What BIG DATA skills can you bring to the office tomorrow to increase your value.
The experienced
Data scientists &
those managers
who leverage
them.
BIG DATA is a management tool even if you have other employees perform
the coding.
BIG DATA is as ubiquitous as the internet.
Gut instinct now
of less value
Datafication
A modern technological trend turning
many aspects of our life into computerized
data that transforms respective
information into new forms of value.
Data
Information
Knowledge
Wisdom
Insight
Knowledge—Wisdom--Insight Vincent Suppa
This is the fulcrum that changes everything.
Knowledge
Information
data
Insight
Wisdom
Actionable
Insight
BIG DATA
A Metaphor / Illustration
Diagraming an
Algorithm
Diagraming an
Algorithm
activity or
purpose natural
to or intended
for a person or
thing.
relationship or
expression
involving one
or more
variables.
Algorithm Script
Just as voice mail and email obviated the manager’s need of
secretarial functions  algorithms eating BIG DATA are now
obviating tactical managerial functions.
Transactional
Work
Tactical
Work
Strategy needs to consume data.
Data, without strategy, has little value.
Modified sine wave
Sine wave
What is the
difference between
analogue
and
digital?
Datafication
only possible due to digitalization of
analogue informaton.
Digital versus Analogue
Interprets continuous sine wave as a digital recreation.
This photo was
taken on film – not
a digital camera.
Are there data points within this
“single” data point?
Social
Construct
Another
example
of social
construct
Now to the
show.
Big data: broad term
for data sets so large &
complex that
traditional data
processing applications
are inadequate.
A terabyte, petabyte &
gigabyte walk into a bar...
Yotta
Zetta
Exa
Giga
Tera
Peta
To give us a sense of scale.
Yottabyte is
1,000 trillion gigabytes
Giga
Tera
Peta
Exa
Zetta
Yotta
Mega
Kilo
The Least You Need to Know About BIG DATA
BIG DATA manifests 3 basic shifts:
 From Small to All
 Clean to Messy
 Causation to Correlation
V. Suppa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Scope of Traditional Data
 Data growth analogous to y = tan x.
 In 2000, ¼ of world’s information digital; reminder preserved in analog.
 digital data doubles around every 3 years
 In 2014 less than 2% of all stored information is analog. (And now we’re in 2017!)
The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Big Data is Not About Lots of Data
 Lots of data existed before Big Data!
 Big Data: ability to render aspects of life into data points
never quantified before.
 This is DATAFICATION … your new word of the day!
V.The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
DATAFICATION
.
V.pa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Location was datafied
before GPS was invented
 Words
treated as
data.
 Friendships
& likes
datafied, via
Facebook
 Shigeomi Koshimizu datafied body contour (body,
posture, weight distribution, etc.).
 Quantified “sitting down.” Measured pressure drivers
exert at 360 different points via sensors (0 to 256 scale).
Quality  Quantify
Datafication Turns Everything into a Data Point
Tools of Datafication
 inexpensive computers (commodity)
 powerful processors (commodity)
 basic statistics (commodity)
 clever software (commodity)
 smart algorithm (differentiator)
Lots of Data versus BIG DATA
Computers computing lots of data:
Teaching computer to translate by inputting bilingual dictionaries
Computers computing BIG DATA
Feed computer years of Canadian parliamentary transcripts French / English)
Statically program it to infer which word of English is best alternative to French
The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
In context, French word lumiere
more appropriate substitute for
the English work light than
leger.
Isn’t this
how a
person
translates?
A Quick Review & then … Causation to Correlation
 sampling population  entire population
 pristine data  non curated messy data
 causation  correlation
Reasons on how the world works replaced with learning about
association among phenomena
 Knowing cause “is” desirable.
 But cause is harder to figure out
 Cause as illusion? Cognitive bias
V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Saving Trucks Saving Babies
Saving
Epidemics
Saving
Buildings
Place sensors on parts to identify heat
& vibrational patterns associated with
failures leading to breakdowns.
Can predict a breakdown before it happens &
replace parts in garage & not on side of the road.
 Data does not tell us why the part is in trouble
 It reveals enough to know the what
 Can guide investigations into discovering underlying cause
Causation to Correlation
 When saving lives, knowing something is likely
to occur more important than knowing why.
 Eventually, “the why” will be investigated.
Can Big Data Save Babies?
Used Big Data to spot infections in premature
babies before symptoms appear.
 Information flow >1000 data points per second
 Discovered correlations between very minor changes and more serious problems
Big Data Predicts Epidemics Better than CDC
CDC tracks patient visits to clinics
Information suffers from 2 week reporting lag
Google took 50 mm most commonly searched terms from 2003 – 2008
Compared them against historical influenza data from CDC.
Searches then correlated with CDC’s data on outbreaks of flu.
How All Three Shifts Are Illustrated
Small to All
Ran 100% of US searches for 6 years through an algorithm
identified 45 searches correlated against CDC data on flu outbreak (runny nose,
body aches, etc. - ).
Clean to Messy
Searches imperfect with misspellings, incomplete phrases & included healthy
people searching on behalf of others.
Causation to Correlation
Will anyone claim typing symptoms in a search engine gives you the flu?
Big Data via searches predicts outbreaks real time compared to
CDC’s traditional data analytics that lag 2 week lag
Illegally subdivided buildings
more likely to catch fire.
200 inspectors to respond to 25K
complaints / year wrt overcrowded
buildings.
NYC created database of 900K buildings augmented
by troves of data collected by 19 agencies:
• Records of tax liens
• Anomalies in utility usage
• Service cuts
• Missed payments
• Ambulance visits
• Local crime rates
• Rodent complaints
• Etc.
Big Data
increases the
productivity of
each inspector
How Did They Do It?
1. Compared database (5 years of building fires)
2. Ranked by severity
3. Observed correlation. (Not causality!)
4. Data scientists triaged complaints for inspections.
Concluded that a building’s:
 type & age main predictor of fire; other variables superfluous
 permit for exterior brickwork correlated lower risk of fire.
Result: Vacate orders increased from 13% to 70%
Building characteristics did not cause fire but were correlated with fire risk.
Spending money on the exterior
correlates for an up to code interior
But just the intent to begin work
correlates enough to predict an outcome
Pull disparate sets of texts & puts them into a
“point of singularity.”
Currently ae 70% of data is text. Pictures to be
quantified under separate protocols.Create a Corpus  body of text to
be analyzed.
R, for example, has set of functions to clean up a Corpus by excluding data points
superfluous to analysis. (Delete commas, periods & words such as but & and, etc. –
R cleans up files by reducing corpus to primary words crucial to analysis.
Truncates words with common stem  this is called stemming. (e.g. engineer &
engineering both become the same word. Think of mathematical analogy of
number factoring versus least common dominator.
1
2
3
4Mathematical matrix to describes frequency of
terms that occur in a collection of documents.
Rows correspond to documents in the collection
& columns correspond to terms.
Create a document term matrix that measures
frequency of words that remain after corpus
“cleanup” discussed in previous slide.
4
You are left with primary
outputs that enable you to do
counts in each cell.
You’ve datafied or quantified
words that others only qualify
that prevents analysis.
You can now do lots of
interesting stuff!
Term document matrix cluster
analysis reveals prevalent themes.
Document-term matrix
Cluster analysis  review at how all your words cluster in your data matrix cluster.
The result of this analysis is that we can reduce our matrix to fewer columns.
Font Size & even
Color embedded
with information.
This information
is actionable.
For centuries we have manually counted sets of
words to determining their frequencies.
Zipf's law states that given some corpus of
natural language utterances, the frequency of any
word is inversely proportional to its rank in the
frequency table.
Used for resumes as a way to
increase information density – to
be covered at a future webinar.
 With these data sets, we can run sentiment analysis!
 Determine occurrence rate of certain themes qualified as opinions.
 To determine if people like a restaurant we’d look at words
reviewers used via social media in the comment section.
Love
10
Hate
-10
Dislike
- 7
Qualitatively, we quantify the
weakness or strength of these signals.
We determine words that correlate to
having disliked or liked the movie and
to what degree along a predetermined
discreet continuum .
Pre-establish words in
narrative responses now
embedded in clusters
signal positive or negative
statements about a movie,
restaurant or Hammacher
Schlemme customer
review.
Like
7
The difference between analog and digital signals is that
an analog signal is a continuous electrical message while digital is a
series of values that represent information.
To determinate what traits can predict future outcomes, look at historical data.
Correlate “judgements” to see if they can predict from groupings, meaning which
ones predict against other dataset.
This is cross validation and is determined by looking at historical data sets.
Master Algorithms script other
algorithms on an at need basis
free of human interaction.
Machine to machine (M2M) technology that
enables networked devices to exchange
information & perform actions without the
manual assistance of humans.
This is what is replacing traditional
managerial jobs.
Firms that still employ these types of
jobs feel less pressure to keep salaries at
pace with inflation over time.
Machine learning can test statistical models. ….. for
example, testing against known political party membership
& updating the algorithm as new data comes in.
In M2M, we let data points come in, refresh & update to
automatically script even more accurate algorithm.
Can infer your political affliction by
first 19 likes even if those likes are
completely apolitical.
What Can I Do Tomorrow Morning at the Office?
1. Take inventory of the data you already collect
A. Internal data.
B. External data accessed from FOI Act – to be discuss subsequently.
C. External data legally purchased from vendors (Yelp, FB, Double Click, etc.) -
D. Create glossary of data definition. (headcount example)
2. Determine decisions to derive from Big Data
A. Select most pressing problem based on Pareto 80/20 rule.
B. In plain English, state your problem statement.
C. Write down independent variables (inventory set of data at your disposal.)
D. Determine dependent variable (preferred outcome to your problem statement.)
3. Write down your hypothesis
4. Contact your IT or data science department. If not …..
5. Contract STEM grad students & turn them into data scientists
6. Code your hypothesis
Even if I hate coding and math!
QuantitativeSkills
The Freedom of Information
Act (FOIA), 5 U.S.C. § 552, is a
federal law that allows for
disclosure of previously
unreleased information controlled
by the US government.
Correlate to external
data with troves of data
from US gov’t.
(Examples: MTA apps)!
Enacted in 1966, allows
U.S. citizens to petition
government for official
information.
Business problem you are trying to solve in plain language stated as a
problem statement
State it in a hypothesis.
Collect Data, from systems
already set in place.
Test hypothesis
Coding is
the new
literacy.
Coding Classes.
Most are on-line, a
few on-site.
Some free & some
at cost.
Most of you will not be competing
with other coders – just other
Marketing, HR or Financial
professionals who know nothing
about coding!
Should I learn to read?
Should I learn how to use the internet?
Should I learn about coding?
A little about R• R – Free
• Contains embedded tools to pull external data
• Tools that scrape data from any website, (Reuters, as one example)
• Text Mining: Knime (another software tool for text mining) – you can
download it. (pronounced like 9 but with a “m”. Has graphical interface
instead of using a scripting language.)
• Remember, Word Clouds is an example of text mining.
• R was written in C language – coders wrote functions in “C” to create
macros in R to pull data - analogous to a macro in excel.
• R will let you pull data into a corpus.
KNIME - Konstanz Information Miner  open source data analytics, reporting & integration
platform. It integrates various components for machine learning & data mining.
You’re not competing against other coders.
You’re competing against others in your field that know
nothing about coding.
Facebook accomplished
what democratic gov’t
tried but failed to do –
build a database of
citizens.
Datafication turns all aspects of life & turns it into data.
Google’s
augmented reality
glasses datafy the
gaze
Twitter datafies
stray thoughts
LinkedIn
datafied
professional
networks
The Floor as a Giant iPad via
surface based computing technology
V. The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
No thank you,
I’m just looking .
That’s okay,
I’m datafying
your every
move.
Touch sensitive floor
customers walk on
Download a Coupon Texted to You
• What aisles did you walk down or ignore?
• In what sequence did you browse the aisles?
• How long were you in the store?
• What is length of time between store visits?
• How long did you linger in front of the cereal aisle?
• When you checked out, did cereal wind up your cart? How many boxes?
Compare viewing patterns with what wound up in your shopping cart.
Script algorithms to better predicts independent variables (what
they stock) with the depend variable of revenue thresholds.
So, what’s my role again in a Big Data World?
As Big Data becomes ubiquitous what skills mark points of differentiations?
 Discovering latent needs & intuition that goes against the facts?
 The mere ability to define a problem proceeds its solution
Big Data has a quantitative & qualitative side
And if you hate math - qualitative skills to harness
 Develop observational skills to separate signal from the noise
 Take inventory of existing data
 Learn to develop hypotheses to test
 Learn how to access external data (FOIA. LinkedIn, etc. - )
 Liaison between internal ERP data & external data
 Network with STEM student to contract data scientists
Your Role in a Big Data World
If Ford queried BIG DATA to discover what customers want, he’d
come up with faster horses who required less water.
In Big Data world, traits to be developed:
 Creativity
 Intuition
 Intellectual curiosity
 Leveraging errors
 Risk taking
V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Read outside your discipline
non sequitur
Vincent Suppa © 2016
Don’t be afraid to fail.
Business is not figure skating. It’s the X games!
If you fail, have quick-to-market failures that
mitigate loss & allow you to harvest what did work
for the next initiative.
Capitalism without failure is
like a religion without sin.
Recommended Courses
NetCom Learning offers a comprehensive portfolio for Big Data training
options. Please see below the list of recommended courses with upcoming
schedules:
Introduction to Python Programming
Essential Python
Introduction to Python Scripting: for the Security Analyst
Check out more Big Data training options with NetCom Learning. CLICK HERE
Our live webinars will help you to touch base a wide variety of IT, soft skills and business
productivity topics; and keep you up to date on the latest IT industry trends. Register
now for our upcoming webinars:
A Brief on Benefits of ITIL for the Organization – April 4
Visualization with Tableau to Enhance Efficiency in Organization – April 6
How Machine Learning Helps Organizations to Work More Efficiently? – April 11
Why Certified Associate in Project Management (CAPM) and How to Prepare? - April 18
A Brief About DevOps and its Practices – April 20
Special Promotion
Whether you're learning new IT or Business skills, or you are developing
a learning plan for your team, for limited time, register for our
Guarantee to Run classes and get 25% off on the course price.
Learn more»
To get latest technology updates, please follow our social media pages!
THANK YOU !!!

BIG DATA | How to explain it & how to use it for your career?

  • 1.
    BIG DATA |How to explain it & how to use it for your career?
  • 2.
  • 3.
    NetCom Learning –Managed Learning Services
  • 5.
    Today’s Agenda If youask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data! It is about datafication – a word so new that even spellcheck functions don’t know it’s a real word! Today’s Agenda  How BIG DATA changes career paths of even the most unsuspecting!  How BIG DATA changes the way business decision are made.  How BIG DATA changes who makes the decisions & the reshuffling balance of power.  What BIG DATA skills can you bring to the office tomorrow to increase your value.
  • 6.
    The experienced Data scientists& those managers who leverage them. BIG DATA is a management tool even if you have other employees perform the coding. BIG DATA is as ubiquitous as the internet. Gut instinct now of less value
  • 7.
    Datafication A modern technologicaltrend turning many aspects of our life into computerized data that transforms respective information into new forms of value.
  • 8.
  • 9.
  • 10.
    A Metaphor /Illustration
  • 11.
  • 12.
    activity or purpose natural toor intended for a person or thing. relationship or expression involving one or more variables.
  • 13.
  • 15.
    Just as voicemail and email obviated the manager’s need of secretarial functions  algorithms eating BIG DATA are now obviating tactical managerial functions. Transactional Work Tactical Work
  • 16.
    Strategy needs toconsume data. Data, without strategy, has little value.
  • 17.
    Modified sine wave Sinewave What is the difference between analogue and digital? Datafication only possible due to digitalization of analogue informaton.
  • 18.
  • 19.
    Interprets continuous sinewave as a digital recreation.
  • 28.
    This photo was takenon film – not a digital camera.
  • 30.
    Are there datapoints within this “single” data point? Social Construct
  • 31.
  • 32.
  • 33.
    Big data: broadterm for data sets so large & complex that traditional data processing applications are inadequate.
  • 34.
    A terabyte, petabyte& gigabyte walk into a bar...
  • 35.
  • 36.
    Yottabyte is 1,000 trilliongigabytes Giga Tera Peta Exa Zetta Yotta Mega Kilo
  • 37.
    The Least YouNeed to Know About BIG DATA BIG DATA manifests 3 basic shifts:  From Small to All  Clean to Messy  Causation to Correlation V. Suppa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 38.
    Scope of TraditionalData  Data growth analogous to y = tan x.  In 2000, ¼ of world’s information digital; reminder preserved in analog.  digital data doubles around every 3 years  In 2014 less than 2% of all stored information is analog. (And now we’re in 2017!) The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 39.
    Big Data isNot About Lots of Data  Lots of data existed before Big Data!  Big Data: ability to render aspects of life into data points never quantified before.  This is DATAFICATION … your new word of the day! V.The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 40.
    DATAFICATION . V.pa The Definitive90 Thousand Foot Lecture on BIG Data© 2014 Location was datafied before GPS was invented  Words treated as data.  Friendships & likes datafied, via Facebook
  • 41.
     Shigeomi Koshimizudatafied body contour (body, posture, weight distribution, etc.).  Quantified “sitting down.” Measured pressure drivers exert at 360 different points via sensors (0 to 256 scale). Quality  Quantify Datafication Turns Everything into a Data Point
  • 42.
    Tools of Datafication inexpensive computers (commodity)  powerful processors (commodity)  basic statistics (commodity)  clever software (commodity)  smart algorithm (differentiator)
  • 43.
    Lots of Dataversus BIG DATA Computers computing lots of data: Teaching computer to translate by inputting bilingual dictionaries Computers computing BIG DATA Feed computer years of Canadian parliamentary transcripts French / English) Statically program it to infer which word of English is best alternative to French The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 In context, French word lumiere more appropriate substitute for the English work light than leger. Isn’t this how a person translates?
  • 44.
    A Quick Review& then … Causation to Correlation  sampling population  entire population  pristine data  non curated messy data  causation  correlation Reasons on how the world works replaced with learning about association among phenomena  Knowing cause “is” desirable.  But cause is harder to figure out  Cause as illusion? Cognitive bias V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 45.
    Saving Trucks SavingBabies Saving Epidemics Saving Buildings
  • 46.
    Place sensors onparts to identify heat & vibrational patterns associated with failures leading to breakdowns. Can predict a breakdown before it happens & replace parts in garage & not on side of the road.  Data does not tell us why the part is in trouble  It reveals enough to know the what  Can guide investigations into discovering underlying cause Causation to Correlation
  • 47.
     When savinglives, knowing something is likely to occur more important than knowing why.  Eventually, “the why” will be investigated.
  • 48.
    Can Big DataSave Babies? Used Big Data to spot infections in premature babies before symptoms appear.  Information flow >1000 data points per second  Discovered correlations between very minor changes and more serious problems
  • 49.
    Big Data PredictsEpidemics Better than CDC CDC tracks patient visits to clinics Information suffers from 2 week reporting lag Google took 50 mm most commonly searched terms from 2003 – 2008 Compared them against historical influenza data from CDC. Searches then correlated with CDC’s data on outbreaks of flu.
  • 50.
    How All ThreeShifts Are Illustrated Small to All Ran 100% of US searches for 6 years through an algorithm identified 45 searches correlated against CDC data on flu outbreak (runny nose, body aches, etc. - ). Clean to Messy Searches imperfect with misspellings, incomplete phrases & included healthy people searching on behalf of others. Causation to Correlation Will anyone claim typing symptoms in a search engine gives you the flu? Big Data via searches predicts outbreaks real time compared to CDC’s traditional data analytics that lag 2 week lag
  • 51.
    Illegally subdivided buildings morelikely to catch fire. 200 inspectors to respond to 25K complaints / year wrt overcrowded buildings.
  • 52.
    NYC created databaseof 900K buildings augmented by troves of data collected by 19 agencies: • Records of tax liens • Anomalies in utility usage • Service cuts • Missed payments • Ambulance visits • Local crime rates • Rodent complaints • Etc. Big Data increases the productivity of each inspector
  • 53.
    How Did TheyDo It? 1. Compared database (5 years of building fires) 2. Ranked by severity 3. Observed correlation. (Not causality!) 4. Data scientists triaged complaints for inspections. Concluded that a building’s:  type & age main predictor of fire; other variables superfluous  permit for exterior brickwork correlated lower risk of fire. Result: Vacate orders increased from 13% to 70% Building characteristics did not cause fire but were correlated with fire risk.
  • 54.
    Spending money onthe exterior correlates for an up to code interior But just the intent to begin work correlates enough to predict an outcome
  • 56.
    Pull disparate setsof texts & puts them into a “point of singularity.” Currently ae 70% of data is text. Pictures to be quantified under separate protocols.Create a Corpus  body of text to be analyzed. R, for example, has set of functions to clean up a Corpus by excluding data points superfluous to analysis. (Delete commas, periods & words such as but & and, etc. – R cleans up files by reducing corpus to primary words crucial to analysis. Truncates words with common stem  this is called stemming. (e.g. engineer & engineering both become the same word. Think of mathematical analogy of number factoring versus least common dominator. 1 2 3
  • 57.
    4Mathematical matrix todescribes frequency of terms that occur in a collection of documents. Rows correspond to documents in the collection & columns correspond to terms. Create a document term matrix that measures frequency of words that remain after corpus “cleanup” discussed in previous slide. 4 You are left with primary outputs that enable you to do counts in each cell. You’ve datafied or quantified words that others only qualify that prevents analysis. You can now do lots of interesting stuff! Term document matrix cluster analysis reveals prevalent themes. Document-term matrix
  • 58.
    Cluster analysis review at how all your words cluster in your data matrix cluster. The result of this analysis is that we can reduce our matrix to fewer columns. Font Size & even Color embedded with information. This information is actionable.
  • 59.
    For centuries wehave manually counted sets of words to determining their frequencies. Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Used for resumes as a way to increase information density – to be covered at a future webinar.
  • 60.
     With thesedata sets, we can run sentiment analysis!  Determine occurrence rate of certain themes qualified as opinions.  To determine if people like a restaurant we’d look at words reviewers used via social media in the comment section. Love 10 Hate -10 Dislike - 7 Qualitatively, we quantify the weakness or strength of these signals. We determine words that correlate to having disliked or liked the movie and to what degree along a predetermined discreet continuum . Pre-establish words in narrative responses now embedded in clusters signal positive or negative statements about a movie, restaurant or Hammacher Schlemme customer review. Like 7
  • 61.
    The difference betweenanalog and digital signals is that an analog signal is a continuous electrical message while digital is a series of values that represent information.
  • 62.
    To determinate whattraits can predict future outcomes, look at historical data. Correlate “judgements” to see if they can predict from groupings, meaning which ones predict against other dataset. This is cross validation and is determined by looking at historical data sets. Master Algorithms script other algorithms on an at need basis free of human interaction. Machine to machine (M2M) technology that enables networked devices to exchange information & perform actions without the manual assistance of humans. This is what is replacing traditional managerial jobs. Firms that still employ these types of jobs feel less pressure to keep salaries at pace with inflation over time.
  • 63.
    Machine learning cantest statistical models. ….. for example, testing against known political party membership & updating the algorithm as new data comes in. In M2M, we let data points come in, refresh & update to automatically script even more accurate algorithm. Can infer your political affliction by first 19 likes even if those likes are completely apolitical.
  • 64.
    What Can IDo Tomorrow Morning at the Office? 1. Take inventory of the data you already collect A. Internal data. B. External data accessed from FOI Act – to be discuss subsequently. C. External data legally purchased from vendors (Yelp, FB, Double Click, etc.) - D. Create glossary of data definition. (headcount example) 2. Determine decisions to derive from Big Data A. Select most pressing problem based on Pareto 80/20 rule. B. In plain English, state your problem statement. C. Write down independent variables (inventory set of data at your disposal.) D. Determine dependent variable (preferred outcome to your problem statement.) 3. Write down your hypothesis 4. Contact your IT or data science department. If not ….. 5. Contract STEM grad students & turn them into data scientists 6. Code your hypothesis Even if I hate coding and math! QuantitativeSkills
  • 65.
    The Freedom ofInformation Act (FOIA), 5 U.S.C. § 552, is a federal law that allows for disclosure of previously unreleased information controlled by the US government. Correlate to external data with troves of data from US gov’t. (Examples: MTA apps)! Enacted in 1966, allows U.S. citizens to petition government for official information.
  • 66.
    Business problem youare trying to solve in plain language stated as a problem statement State it in a hypothesis. Collect Data, from systems already set in place. Test hypothesis
  • 68.
    Coding is the new literacy. CodingClasses. Most are on-line, a few on-site. Some free & some at cost. Most of you will not be competing with other coders – just other Marketing, HR or Financial professionals who know nothing about coding!
  • 69.
    Should I learnto read? Should I learn how to use the internet? Should I learn about coding?
  • 72.
    A little aboutR• R – Free • Contains embedded tools to pull external data • Tools that scrape data from any website, (Reuters, as one example) • Text Mining: Knime (another software tool for text mining) – you can download it. (pronounced like 9 but with a “m”. Has graphical interface instead of using a scripting language.) • Remember, Word Clouds is an example of text mining. • R was written in C language – coders wrote functions in “C” to create macros in R to pull data - analogous to a macro in excel. • R will let you pull data into a corpus. KNIME - Konstanz Information Miner  open source data analytics, reporting & integration platform. It integrates various components for machine learning & data mining.
  • 73.
    You’re not competingagainst other coders. You’re competing against others in your field that know nothing about coding.
  • 74.
    Facebook accomplished what democraticgov’t tried but failed to do – build a database of citizens.
  • 75.
    Datafication turns allaspects of life & turns it into data. Google’s augmented reality glasses datafy the gaze Twitter datafies stray thoughts LinkedIn datafied professional networks
  • 76.
    The Floor asa Giant iPad via surface based computing technology V. The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 No thank you, I’m just looking . That’s okay, I’m datafying your every move. Touch sensitive floor customers walk on
  • 77.
    Download a CouponTexted to You • What aisles did you walk down or ignore? • In what sequence did you browse the aisles? • How long were you in the store? • What is length of time between store visits? • How long did you linger in front of the cereal aisle? • When you checked out, did cereal wind up your cart? How many boxes? Compare viewing patterns with what wound up in your shopping cart. Script algorithms to better predicts independent variables (what they stock) with the depend variable of revenue thresholds.
  • 78.
    So, what’s myrole again in a Big Data World? As Big Data becomes ubiquitous what skills mark points of differentiations?  Discovering latent needs & intuition that goes against the facts?  The mere ability to define a problem proceeds its solution Big Data has a quantitative & qualitative side And if you hate math - qualitative skills to harness  Develop observational skills to separate signal from the noise  Take inventory of existing data  Learn to develop hypotheses to test  Learn how to access external data (FOIA. LinkedIn, etc. - )  Liaison between internal ERP data & external data  Network with STEM student to contract data scientists
  • 79.
    Your Role ina Big Data World If Ford queried BIG DATA to discover what customers want, he’d come up with faster horses who required less water. In Big Data world, traits to be developed:  Creativity  Intuition  Intellectual curiosity  Leveraging errors  Risk taking V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 Read outside your discipline
  • 81.
    non sequitur Vincent Suppa© 2016 Don’t be afraid to fail. Business is not figure skating. It’s the X games! If you fail, have quick-to-market failures that mitigate loss & allow you to harvest what did work for the next initiative. Capitalism without failure is like a religion without sin.
  • 82.
    Recommended Courses NetCom Learningoffers a comprehensive portfolio for Big Data training options. Please see below the list of recommended courses with upcoming schedules: Introduction to Python Programming Essential Python Introduction to Python Scripting: for the Security Analyst Check out more Big Data training options with NetCom Learning. CLICK HERE
  • 83.
    Our live webinarswill help you to touch base a wide variety of IT, soft skills and business productivity topics; and keep you up to date on the latest IT industry trends. Register now for our upcoming webinars: A Brief on Benefits of ITIL for the Organization – April 4 Visualization with Tableau to Enhance Efficiency in Organization – April 6 How Machine Learning Helps Organizations to Work More Efficiently? – April 11 Why Certified Associate in Project Management (CAPM) and How to Prepare? - April 18 A Brief About DevOps and its Practices – April 20
  • 84.
    Special Promotion Whether you'relearning new IT or Business skills, or you are developing a learning plan for your team, for limited time, register for our Guarantee to Run classes and get 25% off on the course price. Learn more»
  • 85.
    To get latesttechnology updates, please follow our social media pages!
  • 87.

Editor's Notes

  • #41 Words now treated as data when computers mine century’s worth of books. Even friendship and “likes” are datafied, via Facebook
  • #44 It can infer the probability that a traffic light is green and not red “or”
  • #82 65 slides as of October 24 2016 -