This document discusses strategies for managing large datasets, known as "big data". It identifies several challenges, such as ensuring proper hardware, software, data analysis tools, and report formats are selected. Effective big data management requires tightly defining problems, understanding user needs, and selecting fast platforms tailored to the data volume and type. Data mining software and defining the data's structure are important. The most important consideration is producing reports end users can easily understand.
In today’s multi-channel customer communications environment, what must be archived and the amount of data to be archived are both growing exponentially. This presentation discusses why the issue is exploding and addresses technical aspects of developing and implementing a CCM archival solution.
In today’s multi-channel customer communications environment, what must be archived and the amount of data to be archived are both growing exponentially. This presentation discusses why the issue is exploding and addresses technical aspects of developing and implementing a CCM archival solution.
Lucas Parker, Sr Software Development Engineer of Research & Development at Visible Technologies, presents his perspective on data mining and engineering.
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
Conclusions you reach with data are only valid if they correctly interpret your data set. In many organizations, the responsibility for collecting and aggregating data is distributed, so it can be hard to ensure that everyone who uses a data set understands the limitations of the signals in that pipeline.
As an example, many companies make important decisions about what events constitute an “active user,” and these decisions are reflected in the pipeline code. Changes to a pipeline may not be communicated to all downstream users, leading to misinformed conclusions even from correctly executed analyses.
In this talk, Richard will share three key questions to help ensure that you are interpreting your data correctly and drawing accurate conclusions.
Lucas Parker, Sr Software Development Engineer of Research & Development at Visible Technologies, presents his perspective on data mining and engineering.
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
Conclusions you reach with data are only valid if they correctly interpret your data set. In many organizations, the responsibility for collecting and aggregating data is distributed, so it can be hard to ensure that everyone who uses a data set understands the limitations of the signals in that pipeline.
As an example, many companies make important decisions about what events constitute an “active user,” and these decisions are reflected in the pipeline code. Changes to a pipeline may not be communicated to all downstream users, leading to misinformed conclusions even from correctly executed analyses.
In this talk, Richard will share three key questions to help ensure that you are interpreting your data correctly and drawing accurate conclusions.
OSMC 2015: Testing in Production by Devdas BhagatNETWAYS
For most ecommerce companies, software is not the final deliverable product. It is a research tool, to determine what customers will pay for. To be able to get good data from software, monitoring and analytics must be built into the system. Alerting must come from business requirements and be based on application generated data.
In the traditional operations world, we monitor what is easy, and avoid monitoring that which is difficult. This talk is an attempt to show people that monitoring must be driven by metrics from the CxO office, and then potentially involve technical metrics if needed.
This talk explains why functional and business level monitoring is crucial. We also cover the tradeoffs from a DTAP model to continuous deployment. There will be a brief introduction to a couple of useful monitoring tools for functional monitoring. No special technical skills are expected of the audience, but having a general overview of the monitoring world is a good thing. This talk is not limited to ecommerce companies, but is most applicable to that environment.
IW14 Session: Mike Gualtieri, Forrester ResearchSoftware AG
Session: Apama & Terracotta World; Big Data Streaming Analytics - Right Here, Right Now
Presentation Title: Streaming Analytics Is Icing On The Big Data Cake
Presentation given by Mike Gualtieri, Principal Analyst at Forrester Research, during the Apama & Terracotta World Session at Innovation World 2014 conference, Oct 13-15, 2014, at the Hyatt Regency New Orleans, produced by Software AG. Three days of vision, inspiration and insight. Innovation World is THE global event for digital leaders who are driven to leverage the Software AG Suite: Alfabet, Apama, ARIS, webMethods, Software AG Live, Terracotta and Adabas-Natural.
This is an old talk from 2003/4, but I was asked to post it in 2012.
Perhaps the outsourcing issues have changed, but the idea of investing in yourself is as important now as ever.
How to get what you really want from Testing' with Michael BoltonTEST Huddle
EuroSTAR Conferences, with the support of ISA Software Skillnet, Irish Software Innovation Network and SoftTest, were delighted to bring you a half-day software testing masterclass with Michael Bolton
In this session, Michael Bolton (who has extensive experience as a tester, as a programmer, and as a project manager) explained the role of skilled software testers, and why you might not want to think of testing as "quality assurance".
He present ideas about the relationship between management and testers, and about the service that testers really provide: making quality assurance possible by lighting the way of the project. For those of you who who attended this event, we really hope it was of use to you in your testing careers.
www.eurostarconferences.com
Programming agents without a programming languageAryan Rathore
Programming Agents without a Programming Language
Agents have the potential to actively participate in accomplishing tasks, rather than serving as passive tools as do today's applications.
However, people do not want generic agents--they want help with their jobs, their tasks, their goals.
Agents must be flexible enough to be tailored to each individual.
The most flexible way to tailor a software entity is to program it.
The problem is that programming is too difficult for most people today.
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Big Data Spain
Analyzing organization e-mails in near real time using Hadoop ecosystem tools.
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-8.html#spch9.3
Customer Insights Workshop - Consumer Text Analytics ConferenceMekkin Bjarnadottir
Lexalytics VP Seth Redmore talks about the nuts and bolts of text and sentiment analysis to spot and react to consumer feedback online. From the 2014 Terrapin Consumer Text Analytics Conference. Read more here: http://www.lexalytics.com/blog/2014/consumer-text-analytics-conference
Choosing the right technology can help small firms keep pace with larger firms. Many of the most impactful upgrades are cheap and easy. Learn how to work at peak efficiency and focus on moving your cases forward instead of dealing with mundane administrative tasks.
Iafie call for proposals (extended deadline) (1)David Jimenez
Please see the attached “Call for Proposals” for IAFIE’s 13th Annual Conference to be held in Charles Town, WV from May 21-24, 2017. The deadline for papers, panels, and posters has been extended to March 31. We would enjoy receiving proposals from all individuals dedicated to the IAFIE mission including educators and trainers, intelligence professionals, and students.
Save the Date! Monday December 5, 2016 with Joe CaddellDavid Jimenez
Join the Washington Area Chapter of IAFIE for their Monday, December 5th meeting with Joseph Caddell, Geospatial Intelligence Chair, National Intelligence University, for a discussion on "Historical Case Studies in Intelligence Education: Best Practices, Avoidable Pitfalls". National Cryptologic Museum Magic Room. See you there!
On Friday 28 and Saturday 29 October 2016 the Netherlands Intelligence Studies Association (NISA) celebrates its 25th anniversary with an inspiring two-days conference. Main theme is the strongly changed environment of the intelligence analyst during these past 25 years.
March 1, 2016 panel session on the Current Threat Level In The United States, hosted by Marymount University, 2807 North Glebe Road Arlington, Virginia 22207. Reinsch Auditorium
INTERNATIONAL ASSOCIATION FOR INTELLIGENCE EDUCATION - 2016 INSTRUCTOR OF THE YEAR AWARD.
Nominations (including self nominations) are due by May 1, 2016
1. copyright Earthsongs Holistic LLC
2014
Big Data:Big Data:
Strategies and SynergiesStrategies and Synergies
Melinda H. ConnorMelinda H. Connor
D.D., Ph.D., AMP, FAMD.D., Ph.D., AMP, FAM
Adjunct Professor, Akamai UniversityAdjunct Professor, Akamai University
2. copyright Earthsongs Holistic LLC
2014
Melinda H. Connor, D.D., Ph.D., AMP, FAMMelinda H. Connor, D.D., Ph.D., AMP, FAM
Adjunct Professor, Akamai University, Hilo, HawaiiAdjunct Professor, Akamai University, Hilo, Hawaii
Science Advisor, Spirituals for the 21st Century, GeorgiaScience Advisor, Spirituals for the 21st Century, Georgia
and Nolan Payton Archive of Sacred Music, Californiaand Nolan Payton Archive of Sacred Music, California
State University Dominguez HillsState University Dominguez Hills
CEO, National Foundation for Energy HealingCEO, National Foundation for Energy Healing
Dr. Connor is the former team lead level 3 support forDr. Connor is the former team lead level 3 support for
IBM’s Business Intelligence Technical Support Group.IBM’s Business Intelligence Technical Support Group.
Melinda_Connor@mindspring.comMelinda_Connor@mindspring.com
3. copyright Earthsongs Holistic LLC
2014
What are the “Big Issues”What are the “Big Issues”
around “Big Data”?around “Big Data”?
4. copyright Earthsongs Holistic LLC
2014
Challenges:Challenges:
• Quality of programming skills of theQuality of programming skills of the
computer programmers.computer programmers.
• Level of problem definition.Level of problem definition.
• Level of actual problemLevel of actual problem
understanding in the specific area.understanding in the specific area.
• Correct hardware to solve the issue.Correct hardware to solve the issue.
• Correct software to solve the issue.Correct software to solve the issue.
5. copyright Earthsongs Holistic LLC
2014
Challenges con’t:Challenges con’t:
• Intersection and compatibility of theIntersection and compatibility of the
hardware and software.hardware and software.
• Intersection and compatibility of theIntersection and compatibility of the
software on multiple platforms.software on multiple platforms.
• Understanding of the end user needs.Understanding of the end user needs.
• Production of the reports in a formatProduction of the reports in a format
that the end user can understand.that the end user can understand.
6. copyright Earthsongs Holistic LLC
2014
Client QuoteClient Quote
I don’t care how your softwareI don’t care how your software
works. I don’t want to spendworks. I don’t want to spend
time with your software. I justtime with your software. I just
want the data I need to run mywant the data I need to run my
business!business!
7. copyright Earthsongs Holistic LLC
2014
Flip Side:Flip Side:
• Poorly trained user community wantingPoorly trained user community wanting
turn key solutions.turn key solutions.
• The incorrect people making theThe incorrect people making the
purchasing decisions.purchasing decisions.
• Poorly defined understanding of what thePoorly defined understanding of what the
“real” problem is that they are trying to“real” problem is that they are trying to
solve.solve.
• Poor quality problem reports.Poor quality problem reports.
9. copyright Earthsongs Holistic LLC
2014
How can utilize the terabytes per hourHow can utilize the terabytes per hour
that you are receiving?that you are receiving?
• Define the needs closely as possible to matchDefine the needs closely as possible to match
the needs of the business or situationthe needs of the business or situation
• Do data mining! There will be more that youDo data mining! There will be more that you
can usecan use
• Select the correct platform to do theSelect the correct platform to do the
processing at speedprocessing at speed
• Understand all of the tools that are available –Understand all of the tools that are available –
do not limit yourself to one companies toolsdo not limit yourself to one companies tools
but do write in clauses that the software mustbut do write in clauses that the software must
work together or no one gets paid.work together or no one gets paid.
10. copyright Earthsongs Holistic LLC
2014
What is the most effectiveWhat is the most effective
management of this “big data”?management of this “big data”?
• Play both ends against the middle!Play both ends against the middle!
–One end is the problem you are trying to solve.One end is the problem you are trying to solve.
–The other end is the report the end user needs.The other end is the report the end user needs.
• Build fast platforms that are correctly sized for theBuild fast platforms that are correctly sized for the
load.load.
• Limit the bottlenecks in the hardware.Limit the bottlenecks in the hardware.
• Have the correct people do the purchasing and useHave the correct people do the purchasing and use
industry specialists.industry specialists.
11. copyright Earthsongs Holistic LLC
2014
SPEED,SPEED,
CORRECT PLATFORM,CORRECT PLATFORM,
CORRECT FORM OF DATA BASE,CORRECT FORM OF DATA BASE,
CORRECT TOOLS for ANALYSISCORRECT TOOLS for ANALYSIS
and theand the
CORRECT FORM OF THE REPORTCORRECT FORM OF THE REPORT
12. copyright Earthsongs Holistic LLC
2014
What are the most effective ways ofWhat are the most effective ways of
understanding the ecologicalunderstanding the ecological
landscape of the data you arelandscape of the data you are
receiving?receiving?
• Start by understanding the types of data you areStart by understanding the types of data you are
collecting.collecting.
• Then understand the tools available.Then understand the tools available.
• For example: Object oriented vs relationalFor example: Object oriented vs relational
databases which do you use and when do youdatabases which do you use and when do you
use one or the other?use one or the other?
13. copyright Earthsongs Holistic LLC
2014
How do you determine new corporateHow do you determine new corporate
strategic direction based on the datastrategic direction based on the data
when the shape of the data itself is notwhen the shape of the data itself is not
clear?clear?
By defining the problem that youBy defining the problem that you
areare
trying to solve very tightly. Thentrying to solve very tightly. Then
you get the data which answers theyou get the data which answers the
14. copyright Earthsongs Holistic LLC
2014
How long do you keep the raw data?How long do you keep the raw data?
• How much storage space do you have available and how fastHow much storage space do you have available and how fast
are you getting the data?are you getting the data?
• What are your storage processing speeds and how fast can youWhat are your storage processing speeds and how fast can you
process the data that is available.process the data that is available.
• Know where the bottlenecks are in the physical limitations ofKnow where the bottlenecks are in the physical limitations of
your hardware:your hardware:
• For example: if you have a slow IO handler?For example: if you have a slow IO handler?
• Know the limitations in the way your database is designed:Know the limitations in the way your database is designed:
• File vs table vs row/column locking!File vs table vs row/column locking!
• What about threading?What about threading?
• When is the OS software going to start thrashing?When is the OS software going to start thrashing?
• What about speed of allocation of memory space?What about speed of allocation of memory space?
• What are the legal requirements?What are the legal requirements?
15. copyright Earthsongs Holistic LLC
2014
Real World Example:Real World Example:
• Internet broadcast of a scienceInternet broadcast of a science
experiment:experiment:
• 8k users logged on a system designed8k users logged on a system designed
for 2400 users with differentfor 2400 users with different
businesses.businesses.
• RESULTRESULT
• Crashed every server in the system.Crashed every server in the system.
16. copyright Earthsongs Holistic LLC
2014
And what data will you dump?And what data will you dump?
• Everything you can! You will be gettingEverything you can! You will be getting
more!more!
• Life/data runs in cycles. You will not hear orLife/data runs in cycles. You will not hear or
see the information only once. There are wayssee the information only once. There are ways
to back up the raw data and keep it for ato back up the raw data and keep it for a
number of years but do you REALLY neednumber of years but do you REALLY need
that data?that data?
17. copyright Earthsongs Holistic LLC
2014
What about the limitations of theWhat about the limitations of the
hardware of the various platforms andhardware of the various platforms and
the network structure itself?the network structure itself?
• Problem definition skills of decision makers.Problem definition skills of decision makers.
• They do not define the needs of the business closelyThey do not define the needs of the business closely
enough because they are not using the actual data.enough because they are not using the actual data.
• Do not understand sizing the volume of data properlyDo not understand sizing the volume of data properly
so that the correct processing platform is selected.so that the correct processing platform is selected.
• Do not understand what shape the final product needsDo not understand what shape the final product needs
to be in to be useful to the team.to be in to be useful to the team.
18. copyright Earthsongs Holistic LLC
2014
Real World Example:Real World Example:
• Hospital System (50 hospitals)Hospital System (50 hospitals)
– Wanted to have end users on PC’s so selected a PCWanted to have end users on PC’s so selected a PC
based system which could not handle thebased system which could not handle the
processing load.processing load.
– Decided on centralized servers without tieredDecided on centralized servers without tiered
support.support.
– Did not purchase enough servers.Did not purchase enough servers.
– Did not distribute network load effectively.Did not distribute network load effectively.
– Did not provide enough training on the software toDid not provide enough training on the software to
medical personnel.medical personnel.
19. copyright Earthsongs Holistic LLC
2014
Programmer TrainingProgrammer Training
• Issues with the training of the programmers:Issues with the training of the programmers:
– Many do not understand how to write theMany do not understand how to write the
software to use the hardware mostsoftware to use the hardware most
effectively.effectively.
– AND they do not understand the stacking.AND they do not understand the stacking.
– AND they do not understand how toAND they do not understand how to
optimize the code to make the best use ofoptimize the code to make the best use of
the compilers.the compilers.
21. copyright Earthsongs Holistic LLC
2014
What are the most effective ways ofWhat are the most effective ways of
data-mining?data-mining?
• Specialized software for the platform.Specialized software for the platform.
• Build the algorithms to determine if there areBuild the algorithms to determine if there are
any random correspondences.any random correspondences.
• Know what data you what to review.Know what data you what to review.
• Build meta-data platforms whenever possible.Build meta-data platforms whenever possible.
• Have the people doing the design and buildsHave the people doing the design and builds
understand the shape of the data before theyunderstand the shape of the data before they
start!start!
22. copyright Earthsongs Holistic LLC
2014
Real World Example:Real World Example:
• Soft Drink Company in 122 countries:Soft Drink Company in 122 countries:
• Need to understand peek load days forNeed to understand peek load days for
manufacture and distribution.manufacture and distribution.
• Problem trying to address was concurrenceProblem trying to address was concurrence
when one country would have to support thewhen one country would have to support the
overload of another.overload of another.
• Meta-data critical to understanding andMeta-data critical to understanding and
defining the shape of the data.defining the shape of the data.
23. copyright Earthsongs Holistic LLC
2014
What about cross platform portabilityWhat about cross platform portability
of the final product?of the final product?
Wolf Geiger (1992) - Data is only asWolf Geiger (1992) - Data is only as
good as the format in which it isgood as the format in which it is
presented to the person who has to usepresented to the person who has to use
it. If it is not in a format that they canit. If it is not in a format that they can
use there is no point in spending theuse there is no point in spending the
time to do any of the processing.time to do any of the processing.
24. copyright Earthsongs Holistic LLC
2014
Real World Example:Real World Example:
• Asked the end user to write down exactly whatAsked the end user to write down exactly what
they wanted in the report.they wanted in the report.
• Asked the manager to write down exactly whatAsked the manager to write down exactly what
they wanted in the report.they wanted in the report.
• Asked the computer programmer to writeAsked the computer programmer to write
down exactly what the clients wanted in thedown exactly what the clients wanted in the
report.report.
• Two of three matched. Which one did not?Two of three matched. Which one did not?
25. copyright Earthsongs Holistic LLC
2014
Cell Phone Data: How should it beCell Phone Data: How should it be
parsed?parsed?
• Has to be done on super computers to start based on the volume of the dataHas to be done on super computers to start based on the volume of the data
but it has to end in PC formats!but it has to end in PC formats!
• Object oriented db with full variable length fields.Object oriented db with full variable length fields.
• Needs Multi-dimensional processing:Needs Multi-dimensional processing:
– Computational linguistics.Computational linguistics.
• Analysis of word stressors.Analysis of word stressors.
• Analysis of grammatical syntax.Analysis of grammatical syntax.
– Cognitive focus (topic basis).Cognitive focus (topic basis).
– Recognized vocal stress vs topic.Recognized vocal stress vs topic.
– Risk factor assignment.Risk factor assignment.
– Background noise assessment.Background noise assessment.
– Probability analysis of each of the factors to determine further review.Probability analysis of each of the factors to determine further review.
• Data presentation tools have to be in a format that is currently used thatData presentation tools have to be in a format that is currently used that
everyone understands where to look to find the important information.everyone understands where to look to find the important information.
• Cross platform portability!!!!Cross platform portability!!!!