Data Mining with Big
data
By: Pouya Otarod
Spring 2014
What is …… ?
• Data Mining
‣ computational process of discovering patterns in
large data sets
• Big Data
‣ it is the term for a collection of data sets so large
and complex that it becomes difficult to process
‣ data has exponential growth, both structured and
unstructured
How much Data does
exist?
• 2.5 quintillion bytes of data are created
EVERY DAY
• IBM: 90 percent of the data in the world today were
produced with past two years
• Forms of Data????
Big Data Examples
• October 4th, 2012, the first presidential debate
• Flicker and its photos
Problem…!
• Data has grown tremendously
• This large amount of data is beyond the of software
tools to manage
• Exploring the large volume of data and extracting
useful information and knowledge is a challenge,
and sometimes, it is almost infeasible
HACE Theorem
• Heterogeneous, Autonomous, Complex, Evolving
• Big data starts with large volume, heterogeneous,
autonomous sources with distributed and
decentralized control, and seeks to explore
complex and evolving relationships among data
• These are characteristics of Big Data
• This is theorem to model Big Data characteristics
• Huge Data with heterogeneous and diverse
dimensionality
‣ represent huge volume of data
• Autonomous sources with distributed and
decentralized control
‣ main characteristics of Big Data
• Complex and evolving relationships
Data Mining Challenges with Big
Data
• Big Data Mining Platform
• Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
• Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple
Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data
Thanks for you
attentions !

Data mining with big data

  • 1.
    Data Mining withBig data By: Pouya Otarod Spring 2014
  • 2.
    What is ……? • Data Mining ‣ computational process of discovering patterns in large data sets • Big Data ‣ it is the term for a collection of data sets so large and complex that it becomes difficult to process ‣ data has exponential growth, both structured and unstructured
  • 3.
    How much Datadoes exist? • 2.5 quintillion bytes of data are created EVERY DAY • IBM: 90 percent of the data in the world today were produced with past two years • Forms of Data????
  • 4.
    Big Data Examples •October 4th, 2012, the first presidential debate • Flicker and its photos
  • 5.
    Problem…! • Data hasgrown tremendously • This large amount of data is beyond the of software tools to manage • Exploring the large volume of data and extracting useful information and knowledge is a challenge, and sometimes, it is almost infeasible
  • 6.
    HACE Theorem • Heterogeneous,Autonomous, Complex, Evolving • Big data starts with large volume, heterogeneous, autonomous sources with distributed and decentralized control, and seeks to explore complex and evolving relationships among data • These are characteristics of Big Data • This is theorem to model Big Data characteristics
  • 8.
    • Huge Datawith heterogeneous and diverse dimensionality ‣ represent huge volume of data • Autonomous sources with distributed and decentralized control ‣ main characteristics of Big Data • Complex and evolving relationships
  • 9.
    Data Mining Challengeswith Big Data • Big Data Mining Platform • Dig Data Semantics and Application Knowledge I. Information Sharing and Data Privacy II. Domain and Application Knowledge • Big Data Mining Algorithm I. Local Learning and Model Fusion for Multiple Information Sources II. mining from Sparse, Uncertain, and Incomplete Data III. Mining Complex and Dynamic Data
  • 10.