Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Jobs Complexity
1. The Future of Jobs:
Science of Surprise and Big Data
http://www.slideshare.net/ssood/jobs_complexity
linkedin.com/in/sureshsood
@soody
2. Key Areas of Discussion
1.) Data Science, Big data and Future of Jobs
2.) Complex Systems
3.) Complex Systems Representations:
Agent Based Models
Social Networks
3. Vocabulary
1. Big Data
2. Kolmogorov Complexity
3. Hadoop
4. Complex Systems
5. Complicated
6. Agent based models
7. Particle Swarm Optimisation
8. Preferential Attachment
9. Social Network Representation
4.
5.
6. Variety of Data Types & Big Data Challenge
1. Astronomical
2. Documents
3. Earthquake
4. Email
5. Environmental sensors
6. Fingerprints
7. Health (personal) Images
8. Graph data (social network)
9. Location
10.Marine
11.Particle accelerator
12.Satellite
13.Scanned survey data
14.Sound
15.Text
16.Transactions
17.Video
Big Data consists of extensive datasets primarily in the characteristics of
volume, variety, velocity, and/or variability that require a scalable
architecture for efficient storage, manipulation, and analysis.
. Computational portability is the movement of the computation to the location of the data.
9. The Newman Model of Deception
(Pennebaker et al)
Key word categories for deception mapping:
•Self words e.g. “I” and “me” – decrease when someone distances themselves from
content
•Exclusive words e.g. “but” and “or” decrease with fabricated content owing to
complexity of maintaining deception
• Negative emotion words e.g. “hate” increase in word usage owing to shame or
guilty feeling
•Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the
story on track
12. HadoopConfigurations(SingleandMulti-Rack)
Adapted from: http://stackiq.com/
Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks
3 TB drives ,18 data nodes
configuration represents 648 TB
of raw storage HDFS standard
replication factor of 3
216 TB of usable storage
Name/secondary/data nodes – 6 core 96 GB
Management node – 4 core 16 GB
13. http://tacocopter.com/
New Sources of Information (Big data) : Social Media + Internet of Things
Innovations
7,919 40,204
2,003,254,102 51
Gridded Data Sources
14. The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide
jackets, and so on):
SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where
(V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like
'%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like
'%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')
The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning
the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access
database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900
days, making it one of the largest open-access spatio-temporal datasets as well.
GDELT + BigQuery = Query The Planet
16. The ANZ Heavy Traffic Index comprises flows
of vehicles weighing more than 3.5 tonnes
(primarily trucks) on 11 selected roads
around NZ. It is contemporaneous with GDP
growth.
The ANZ Light Traffic Index is made up of
light or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth in normal circumstances (but cannot
predict sudden adverse events such as the
Global Financial Crisis).
http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
ANZ TRUCKOMETER
17. Black Box Insurance
• Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management
using real data generating individually tailored premiums
• Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km
per month driving or generating 12,000 points per month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to
144 million points per annum.
• Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric
premiums by capturing:
– Number of journeys
– Distances travelled
– Types of roads
– Speed
– Time of travel
– Acceleration and braking
– Any accidents
– Location ?
• Benefits low mileage, smooth and safe drivers
• Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
18. The Future of the Professions
(Susskind & Susskind 2015)
– Tax and audit work replaced by computer assisted techniques
– Technology automating and innovating
– Accounting work reconfiguring
– New business models
– Move from bespoke to “off the peg”
– Mastery of data with new tools and techniques - Big Data
– Diversification
– Shift to proactivity from reactivity
– Professionals replaced by less expert people and high performing systems
– Post-professional society expertise available online
19. The Future of the Professions How Technology Will Transform the Work of Human Experts, Richard Susskind and Daniel Susskind (2015)
20. The Future of the Professions How Technology Will Transform the Work of Human Experts, Richard Susskind and Daniel Susskind (2015)
21. The Future of the Professions How Technology Will Transform the Work of Human Experts, Richard Susskind and Daniel Susskind (2015)
22. Google Trends Worldwide, Australia and New Zealand - Accounting + Analytics
January 2004 - September 2015
Worldwide
Australia
New
Zealand
23. 2020 Global Data Forecast (Bytes)
2020 estimates suggest four times more digital data than all the grains of sand on Earth
Source: Pg. 4, Building a Digital Analytics Organization: Create Value by Integrating Analytical Processes,
Technology, and People into Business Operations by Judah Phillips, FT Press, 30 Jul 2013
24. We’re sitting on a big data time bomb
Catastrophic loss of transparency. Few IT professionals
have experience managing big data platforms at scale
— a situation that has created a massive skills
shortage in the industry. By 2018, U.S. companies will
be short 1.5 million managers able to make data-
based decisions. A recent McKinsey Quarterly report
estimates that, in order to close this gap, companies
would need to spend 50 percent of their data and
analytics budget on training frontline managers; it also
notes that few companies realize this need.
Source: CAMERON SIM, CREWSPARK, OCTOBER 24, 2015
http://venturebeat.com/2015/10/24/were-sitting-on-a-big-data-time-bomb/
25. Australia/NZ needs “30,000 data savvy managers by 2018”
• This statement derives from the McKinsey (2011) study “a shortage of talent necessary for organizations
to take advantage of big data. By 2018, the United States requires a talent pool of 140-190,000 deep
analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big
data to make effective decisions”.
• Taking 2% of the US economy as a rule of thumb, in 2018 Australia will require another 30,000 managers
or analysts. However, the shortage commences well before 2018. These numbers do not accommodate
the training of managers or analysts for overseas destinations.
• Another 2011 study from EMC Corporation interviewed nearly 500 data science and business intelligence
professionals globally. Two-thirds of the informants believe demand will outpace supply and 30% from
disciplines outside of computer science. Additionally, the study found the biggest obstacle to data science
as being education and training.
• A 2012 study “Data Equity: Unlocking the value of big data” commissioned by SAS UK and conducted by
the Centre for Economics and Business Research, an independent business research consultancy, found
unlocking big data leads to adding another 58,000 jobs to the UK economy (2012-2017).
• Gartner (2012) estimates by 2015 4.4 Million IT Jobs will be created globally to Support big data or 1.9
million jobs in the United Sates alone.
• Closer afield, the 2013 Hudson study “Tackling the Big Data Challenge” found 78% of the Australian
research informants, “believe organisations do not have the skills and competencies to successfully
undertake a big data project.
• Building on the McKinsey (2011), Gartner (2012) and Hudson (2013) estimations, Australia and the world
requires 3 distinct but related skills. Most specifically, the demand is very strong for data savvy managers
conversant with big data practice.
27. India’s high demand for big data workers contrasts
with scarcity of skilled talent
The talent deficit is on two fronts, said Velamakanni: data
scientists who can perform analytics, and analytics consultants
who can understand and use the data. The first, big data
engineers and scientists, are extremely scarce. "In the second
category, we need better quality, and India is going to be short
of a million data consultants soon," he said.
Source: India's high demand for big data workers contrasts with scarcity of
skilled talent, Saritha Rai, June, 2014, http://www.techrepublic.com/article/indias-
high-demand-for-big-data-workers-contrasts-with-scarcity-of-skilled-talent/
28. 'The Predictive Accountant’ Persona
1. CA SMP Practice and Member
2. Data savvy
3. Focus shifts from being reactive to proactive and predictive
4. Leverages accounting data and predictive analytics software to find patterns in data and
insights
5. Uses the tools and dashboards to predict client scenarios before time: maximising
opportunity, limiting risks and proactively advising.
6. CA ANZ SMP’s benefit from analytics by adding value when connecting SME client
challenges and opportunities to identified customer patterns. Sharing these insights
delivers more value in the accounting conversations and helps tackle the real business
problems facing clients.
9
29. ‘The Predictive Accountant Portal
The Predictive Accountant Data Sources
Predictive
Analytics
Excel style
dashboard
Connected Practice
Digital Marketing / eNewsletters/ Integrated
business tools software
Apps Marketplace
Accounting Analytic Apps
Education
Analytic Training
30. What is Machine Learning?
Machine learning is a scientific discipline that
deals with the construction and study of
algorithms that can learn from data. Such
algorithms operate by building a model based
on inputs and using that to make predictions or
decisions, rather than following only explicitly
programmed instructions.
http://en.wikipedia.org/wiki/Machine_learning
34. Complex Systems
• Complex Adaptive Systems
• Whole is greater than the sum of the parts
• Many parts
• Relationships “the missing link”
• Emergence and self-organisation
• Non-linear and dynamic patterns
• Difficult to predict outcomes & manage
(little things can make a big difference)
35. Fingerprints of complexity
• medium sized number of agents ( 3 to 10^23 but usually few hundred)
• intelligent (rules following) and adaptive
• local information (no global information, just your neighbors)
• Emergent patterns from very simple local rules.
A school of fish. 3 rules. minimum distance, direction of neighbors, steer
toward average position.
Examples of complex systems. Stock markets, road traffic networks,
evolutionary ecosystems, supermarkets national economies, health care
delivery systems, communications networks, insurance industry
36. For a Moment, I Want You to Think Like an Agent
• Pick two people in the room at random. Label them
A and B.
• Scenario 1 – Move around and keep A between you
and B
• Scenario 2 – Move around and get between A and B
37. Complex Self Organising Systems
3 Rule Model of Self-Organization for a school of fish
by Craig Reynolds (1986)
Separation
keep a minimum distance from your neighbors
Alignment
steer in the average direction of your neighbors
Cohesion
steer toward the average position of your neighbors
Reference:http://www.red3d.com/cwr/boids/
39. No single indicator is sufficiently indicative to trigger
an alert. But together there is enough information
Anomalies?
Lateral
Drift? Micronods? Large
Motions?
Stillness?
Drowsiness
(Surprise)
40. A similar approach can identify the convergence of
terrorism-related activities
Contact with
other
suspects
Last-
minute
ticket
purchase
Financial
discrepancies
Passport
irregularities
Prior
Criminal
Record
Terrorism Event
(Surprise)
41. Complication and Complexity in the
Business World
• Complication in business is the large variety of
"things" in a business that need to be managed
• The nature of complexity is the sometimes
surprising consequences arising from relationships
and interactions
42. Social Network Representation
• Primary focus is actors & relationships # actors & attributes
• Nodes (Actors) connected by Links (Ties/relationship or edge)
• Links represent flows or transfer
– material goods or information
1 2 3
0 1 0
1 0 1
0 1 0
1
2
3
1: 2
2: 1, 3
3: 2
1
32
Adjacency matrix
Adjacency list
1 = presence of link
0 = no direct link
Actors Relationship
Graph or
sociogram
43.
44.
45.
46. Train of Thought Analysis
• A bottom-up approach
• Perceptual process of discovery to uncover
structure
• Distinguish patterns, structure, relationships and
anomalies
• Knowledge is colour coded
• Investigator can spot irregularities
• Not sure why but where does this lead
• Harnesses the power of the human mind
Data Information Knowledge
48. Data Sources
• Police Incident reports (multiple jurisdictions)
• Phone records
• Financial information
• Intelligence information
• Interviews with :
– Witnesses
– suspects
– confidential informants
• Satellite imagery
49. Profile of the Killer
• probably drove a four wheel drive vehicle to access the
remote bush tracks
• had some knowledge of the forest
• the same gun was used to kill two victims.
(A rare US-made Ruger 10/22 rifle)
– RTA vehicle records
– Gym memberships
– Gun licensing
++
50. "As a key member of the NSW Police Team involved in the
investigation of the "backpacker" murders in Belangalo Forest,
NSW, I believe it is fair to say that NetMap proved to be of
invaluable assistance, certainly in the area of correlating massive
amounts of disparate data pertaining to the case – data that
would otherwise have taken significant amounts of resources to
analyse.
NetMap was able to reduce what would have been many years of
information analysis into weeks and help lead the investigation
team to a faster close on the issue.
I believe that NetMap succeeded in providing the NSW Police
with a unique view of information related to the Backpacker
Murderer case, information that allowed investigations to be more
focused and efficient in terms of progressing the case to a faster,
more accurate close. Its capacity to quickly identify ubiquitous
links enabled our analysts to develop revolutionary strategies to
combat criminal enterprises.“
Angus A. Graham, Former Commander, Criminal Research Bureau,
NSW Police State Intelligence Group.
52. NodeXL - Excel 2007/10/13 workbook template for viewing and analyzing networks
http://nodexl.codeplex.com/releases/view/108288
53. Import ego, Fan page and groups networks from Facebook using
Social Network Importer for NodeXL
http://socialnetimporter.codeplex.com/
54. The Open Graph Viz Platform Gephi runs on Windows, Linux and Mac OS X.
Gephi is open-source and free.
https://gephi.org/
55.
56. “I had come to an entirely erroneous
conclusion which shows, my dear
Watson, how dangerous it is to
reason from insufficient data.”
The Adventure of the Speckled Bird
Editor's Notes
Combine traditional and social data to create a Social CRM
Build social fields into customer contact information
Track social media interactions with customers.
Understand where customers hang with social media data
Collect customer feedback from social channels.
Shifting to our experience , fraud is a problem and empirically industry wide its mostly committed by people on your payroll.
That is the people best able to conceal their activity because of the complexities of doing commerce.
However the data is a proxy for their behaviors suspicious or otherwise.
Diana – max links (degree centrality) most connected – connector or hub – number of nodes connected – high influence of spreading info or virus
Heather – best location powerful figure as broker to determine what flows and doesn’t –single point of failure – high betweeness = high influence – position of node as gatekeeper to exploit structural holes (gaps in network)
Fernado & Garth – shortest paths = closeness – the bigger the number the less central
Eigenvector = importance of node in network ~ page rank google is similar measure – being connected to well connected a popularity and power measure