SlideShare a Scribd company logo
1 of 2
In CNSwe are developingaholisticartificial intelligence cybersystem.The systemwill be composed
of sevenintertwinedmodules/elementsincluding;Memory,Sensation,Perception,Reasoning,
Thought,Consciousness,DecisionMakingandAction.
We are well aware the taskis monumental andimprobable andthat’sexactlythe point.InCNSwe
believethatalongthe processof tryingto make the impossiblepossible,we must;stretchourminds
to theirlimit,thinkunlike anyone else,be original &inventive anddevice solutionsthatwill
challenge the boundariesof whatwe know.
Case study:Big Data
Big Data on a tea spoon:
Big data isa set of approaches,methodsandtoolsthatrequire new waystouncovercritical hidden
informationfromlarge datasetsof massive scale.Bigdatausuallyincludesdatasetswithsizes
beyondthe abilityof commonlyusedtoolstoprocessandanalyse the datawithina practical and
acceptable leadtime.Bigdataisgrowingfast,since 2012 data grew fromtensof terabytesto
petabytestoday.
Challengesof BigData:
The keyproblemof BigData is;that it’sgrowingfasterthanMore's law for computationspeed.This
problemwill onlygetworse inyearstocome inparticularwiththe nextgenerationof challenges
such as; gene sequencers,NMRimaging,social media,the internetof everythingandfuture
unknowns.It’simportanttonote,thatwhendealingwithBigData,there are twocrucial challenges;
The firstis identifyrobustmethodstoextractcritical neededinformationfromthe BigDatasetor to
put thisinLehmanterms,findinganeedle inahaystack.The secondchallenge is,todevelop
solutionsthatwill enable fastcomputationof BigData and inparticular,whendatais growingfaster
than the computationrate.To deal withthese defieseffectively,anyproposedsolutionsandtools
shouldbe able totransformBig Data setsintoSmall Data setswhile retainingall the relevant
informationandideallyeliminatingdatanoise.
How to transformBigData setsintoSmall Data setswhile retainingall the information
One of the mosteffective andwell establishedapproachestodeal withBigDatais knownas
“Statistics”. A good representative sample of the BigDataset,in conjunctionwiththe correctuse of
statistical methodsandtools,are capable toextractvital informationtoanswerourquestions,
withinaconfidence level andmarginof error.
But whathappenswhenstatisticsare notthe appropriate approachor the typologyof the problem
isnot suitedforstatistical methods?
We inCNShave developedagroundbreakingmethodandthe toolswhichundercertainconditions
(Ill-problems) canreduce BigDatasize by the square root of the data set dimension(i.e.asetof
10^9 data recordsis reducedto~10^3) enablingtovaporize the haystack(BigDataset) while leaving
the needle (Information) intactandfree of noise.The innovative methodandtoolshave beentested
and the proof of concepthas beenestablished.The mathematical approachandproposed
algorithmsproduce informationreconstructionsof greaterqualitythananyotherexistingmethod,
but at a cost of convergence time (oneoff).Howeveronce the datahas beentransformed,the
manipulationandanalysistime isreducedsignificantly,ourexperimental resultsshoweda reduction
inprocessingtime bya factor of 50. Anotherpronouncedbenefitof thisapproachisthe abilityto
reconstructthe informationwithahighlevel of quality&completeness,regardlessof the data
structure or data size (greatnewsforcloudcomputing).Althoughwe have achievednotable results
inthe testssofar, additional experimentsare plannedtofurthersolidifythe validationof this
innovative andbreakthroughapproach.
For our tests,we useddatafrom NMR experiments,andwere consistentlyable toreduce the
original datasetsfroman average of 750Gb to an average of 0.045 Gb a factor ~10^3 withoutlossof
information,whileeliminatingthe datanoise.Atpresent,we are workingatimprovingthe method
and furtherreduce the datasetssize evenfurther.A paperwiththe preliminaryresultswillbe
publishedbyendof Julythisyear.

More Related Content

What's hot

TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...Tata Consultancy Services
 
Ml in a day v 1.1
Ml in a day v 1.1Ml in a day v 1.1
Ml in a day v 1.1CCG
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
IJDMBD-PUBLISHED ARTICLE
IJDMBD-PUBLISHED ARTICLEIJDMBD-PUBLISHED ARTICLE
IJDMBD-PUBLISHED ARTICLEIJDMBDBohr
 
Graduating Year Career S.O.P.
Graduating Year Career S.O.P.Graduating Year Career S.O.P.
Graduating Year Career S.O.P.Akash Chatterjee
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced AnalyticsOSTHUS
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudLeMeniz Infotech
 

What's hot (18)

TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
 
Data science
Data scienceData science
Data science
 
2005)
2005)2005)
2005)
 
Data Science
Data ScienceData Science
Data Science
 
Ml in a day v 1.1
Ml in a day v 1.1Ml in a day v 1.1
Ml in a day v 1.1
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Welcome to CS310!
Welcome to CS310!Welcome to CS310!
Welcome to CS310!
 
DESCON Keynote Take 2
DESCON Keynote Take 2DESCON Keynote Take 2
DESCON Keynote Take 2
 
IJDMBD-PUBLISHED ARTICLE
IJDMBD-PUBLISHED ARTICLEIJDMBD-PUBLISHED ARTICLE
IJDMBD-PUBLISHED ARTICLE
 
data science
data sciencedata science
data science
 
Graduating Year Career S.O.P.
Graduating Year Career S.O.P.Graduating Year Career S.O.P.
Graduating Year Career S.O.P.
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced Analytics
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Big data
Big dataBig data
Big data
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
 

Similar to CNS and Big Data

A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxvipulkondekar
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
Data Mining vs Statistics
Data Mining vs StatisticsData Mining vs Statistics
Data Mining vs StatisticsAndry Alamsyah
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fataSuraj Sawant
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmIRJET Journal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAmpoolIO
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)NikitaRajbhoj
 

Similar to CNS and Big Data (20)

A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE Theorem
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Data Mining vs Statistics
Data Mining vs StatisticsData Mining vs Statistics
Data Mining vs Statistics
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and Innovations
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
2013-imMens-EuroVis
2013-imMens-EuroVis2013-imMens-EuroVis
2013-imMens-EuroVis
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 

CNS and Big Data

  • 1. In CNSwe are developingaholisticartificial intelligence cybersystem.The systemwill be composed of sevenintertwinedmodules/elementsincluding;Memory,Sensation,Perception,Reasoning, Thought,Consciousness,DecisionMakingandAction. We are well aware the taskis monumental andimprobable andthat’sexactlythe point.InCNSwe believethatalongthe processof tryingto make the impossiblepossible,we must;stretchourminds to theirlimit,thinkunlike anyone else,be original &inventive anddevice solutionsthatwill challenge the boundariesof whatwe know. Case study:Big Data Big Data on a tea spoon: Big data isa set of approaches,methodsandtoolsthatrequire new waystouncovercritical hidden informationfromlarge datasetsof massive scale.Bigdatausuallyincludesdatasetswithsizes beyondthe abilityof commonlyusedtoolstoprocessandanalyse the datawithina practical and acceptable leadtime.Bigdataisgrowingfast,since 2012 data grew fromtensof terabytesto petabytestoday. Challengesof BigData: The keyproblemof BigData is;that it’sgrowingfasterthanMore's law for computationspeed.This problemwill onlygetworse inyearstocome inparticularwiththe nextgenerationof challenges such as; gene sequencers,NMRimaging,social media,the internetof everythingandfuture unknowns.It’simportanttonote,thatwhendealingwithBigData,there are twocrucial challenges; The firstis identifyrobustmethodstoextractcritical neededinformationfromthe BigDatasetor to put thisinLehmanterms,findinganeedle inahaystack.The secondchallenge is,todevelop solutionsthatwill enable fastcomputationof BigData and inparticular,whendatais growingfaster than the computationrate.To deal withthese defieseffectively,anyproposedsolutionsandtools shouldbe able totransformBig Data setsintoSmall Data setswhile retainingall the relevant informationandideallyeliminatingdatanoise. How to transformBigData setsintoSmall Data setswhile retainingall the information One of the mosteffective andwell establishedapproachestodeal withBigDatais knownas “Statistics”. A good representative sample of the BigDataset,in conjunctionwiththe correctuse of statistical methodsandtools,are capable toextractvital informationtoanswerourquestions, withinaconfidence level andmarginof error. But whathappenswhenstatisticsare notthe appropriate approachor the typologyof the problem isnot suitedforstatistical methods? We inCNShave developedagroundbreakingmethodandthe toolswhichundercertainconditions (Ill-problems) canreduce BigDatasize by the square root of the data set dimension(i.e.asetof 10^9 data recordsis reducedto~10^3) enablingtovaporize the haystack(BigDataset) while leaving the needle (Information) intactandfree of noise.The innovative methodandtoolshave beentested and the proof of concepthas beenestablished.The mathematical approachandproposed algorithmsproduce informationreconstructionsof greaterqualitythananyotherexistingmethod,
  • 2. but at a cost of convergence time (oneoff).Howeveronce the datahas beentransformed,the manipulationandanalysistime isreducedsignificantly,ourexperimental resultsshoweda reduction inprocessingtime bya factor of 50. Anotherpronouncedbenefitof thisapproachisthe abilityto reconstructthe informationwithahighlevel of quality&completeness,regardlessof the data structure or data size (greatnewsforcloudcomputing).Althoughwe have achievednotable results inthe testssofar, additional experimentsare plannedtofurthersolidifythe validationof this innovative andbreakthroughapproach. For our tests,we useddatafrom NMR experiments,andwere consistentlyable toreduce the original datasetsfroman average of 750Gb to an average of 0.045 Gb a factor ~10^3 withoutlossof information,whileeliminatingthe datanoise.Atpresent,we are workingatimprovingthe method and furtherreduce the datasetssize evenfurther.A paperwiththe preliminaryresultswillbe publishedbyendof Julythisyear.