Previously known as
Think Big. Move Fast.
Template designed by
brought to you by
SolidQ
• Born in 2002 in USA and Spain
• Established in 2007 in Italy
• More than 1000 customers and more than 200 consultants worldwide
• Dedicated to Data Management on the Microsoft Platform
• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors
• www.solidq.com
Davide Mauri
• 18 Years of experience on the SQL Server Platform
• Specialized in Data Solution Architecture, Database Design, Performance
Tuning, Business Intelligence
• Microsoft SQL Server MVP
• President of UGISS (Italian SQL Server UG)
• Mentor @ SolidQ
• Video, Book & Article Author
• Regular Speaker @ SQL Server events
• Projects, Consulting, Mentoring & Training
Data Science
Reinassance 2.0
“Companies are collecting
mountains of information about
you, to predict how
likely you are to buy a product,
and using that knowledge to
craft a marketing message
precisely calibrated to get you to
do so”
Data Science
• Extraction of knowledge from data
• So, what’s new?
• Nothing. Except that it’s now economic and fast.
• It’s now applicable to everything. And we have a lot of data produced everyday
that can be used to extract knowledge
Data Science
DecisionsKnowledgeInformationData
Data Science
• A Sum Of
• Statistics
• Mathematics
• Machine Learning
• Data Mining
• Computer Programming
• Data Engineering
• Visualization
• Data Warehousing
• High Performance Computing
• To support (Informed) Decision Making
• Data-Driven Decisions
Data Scientist
• IBM
• A data scientist represents an evolution from the business or data analyst role.
• The formal training is similar, with a solid foundation typically in computer science and
applications, modeling, statistics, analytics and math.
• What sets the data scientist apart is strong business acumen, coupled with the ability to
communicate findings to both business and IT leaders in a way that can influence how
an organization approaches a business challenge.
• It's almost like a Renaissance individual who really wants to learn and bring change to
an organization.
• Algorithms are the new gatekeepers
• They decided
• What we find
• What we see
• What we buy
Modern Data Environment
Master
Data
EDW
Data Mart
Big Data
Unstructured
Data
BI Environment
Analytics Environment
Structured
Data
Big Data
The 3 V
No, the 4 V!!!
No, no, the 5 V!!!!!
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big Data
• Volume, Velocity, Variety, Veracity….V<your-v-here>
• Data sets with sizes beyond the ability of commonly used software tools
to capture, curate, manage, and process the data within a tolerable elapsed
time
• Grid Computing, Parallel Computing needed
• keep processing time reasonable
• provide scalability
Big Data Data
• Paradigm: “Store Now, Figure Out Later”
• Data is the new resource. Never throw it away!
• Unstructured Data
• Text Files
• Images
• Sounds
• Structured/Semi Structured Data
• Sensors
• Transactions
• Logs
Data Storage
• RDBMS
• SQL Server
• Hadoop
• HDInsight
• Hortonworks Data Platform
• Distributed File (Eco)System
• CSV
• JSON
• *.*
Data Storage
• Hadoop Ecosystem
http://hortonworks.com/hadoop-modern-data-architecture/
Data Science & Big Data
• Data Science != Big Data
• Data Science Not Only on Big Data
• Data Science can be applied to Big Data
• Data Science starts from Small Data
• 1) find the algorithm that extract knowledge
• 2) measure algorithm results and in terms of probability
Machine Learning
• Machine learning, a branch of artificial intelligence, concerns the construction
and study of systems that can learn from data. (Wikipedia)
• For example, a machine learning system could be trained on email messages to learn to
distinguish between spam and non-spam messages. After learning, it can then be used
to classify new email messages into spam and non-spam folders.
• Flavors
• Supervised
• Unsupervised
Data Analysis
• Common Data Scientists Tools
• R
• Weka
• Octave
• Scikit-Learn
• Common Data Scientists Languages
• Python
• Scala
• F#
Resources
• https://www.coursera.org/
• Data Scientist Specialization
• https://www.khanacademy.org/
• Math
• http://www.osservatori.net/business_intelligence
• Italian Big Data Market Analysis Resources
• http://www.solidq.com/consulting/
• Data Science Services
• Big Data / Business Intelligence / Data Warehousing
Previously known as
Think Big. Move Fast.

Ds01 data science

  • 1.
  • 2.
  • 3.
    SolidQ • Born in2002 in USA and Spain • Established in 2007 in Italy • More than 1000 customers and more than 200 consultants worldwide • Dedicated to Data Management on the Microsoft Platform • Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors • www.solidq.com
  • 4.
    Davide Mauri • 18Years of experience on the SQL Server Platform • Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence • Microsoft SQL Server MVP • President of UGISS (Italian SQL Server UG) • Mentor @ SolidQ • Video, Book & Article Author • Regular Speaker @ SQL Server events • Projects, Consulting, Mentoring & Training
  • 5.
  • 6.
    “Companies are collecting mountainsof information about you, to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to get you to do so”
  • 7.
    Data Science • Extractionof knowledge from data • So, what’s new? • Nothing. Except that it’s now economic and fast. • It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge
  • 8.
  • 9.
    Data Science • ASum Of • Statistics • Mathematics • Machine Learning • Data Mining • Computer Programming • Data Engineering • Visualization • Data Warehousing • High Performance Computing • To support (Informed) Decision Making • Data-Driven Decisions
  • 10.
    Data Scientist • IBM •A data scientist represents an evolution from the business or data analyst role. • The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. • What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. • It's almost like a Renaissance individual who really wants to learn and bring change to an organization.
  • 11.
    • Algorithms arethe new gatekeepers • They decided • What we find • What we see • What we buy
  • 12.
    Modern Data Environment Master Data EDW DataMart Big Data Unstructured Data BI Environment Analytics Environment Structured Data
  • 13.
    Big Data The 3V No, the 4 V!!! No, no, the 5 V!!!!!
  • 14.
  • 15.
    Big Data • Volume,Velocity, Variety, Veracity….V<your-v-here> • Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time • Grid Computing, Parallel Computing needed • keep processing time reasonable • provide scalability
  • 16.
    Big Data Data •Paradigm: “Store Now, Figure Out Later” • Data is the new resource. Never throw it away! • Unstructured Data • Text Files • Images • Sounds • Structured/Semi Structured Data • Sensors • Transactions • Logs
  • 17.
    Data Storage • RDBMS •SQL Server • Hadoop • HDInsight • Hortonworks Data Platform • Distributed File (Eco)System • CSV • JSON • *.*
  • 18.
    Data Storage • HadoopEcosystem http://hortonworks.com/hadoop-modern-data-architecture/
  • 19.
    Data Science &Big Data • Data Science != Big Data • Data Science Not Only on Big Data • Data Science can be applied to Big Data • Data Science starts from Small Data • 1) find the algorithm that extract knowledge • 2) measure algorithm results and in terms of probability
  • 20.
    Machine Learning • Machinelearning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. (Wikipedia) • For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders. • Flavors • Supervised • Unsupervised
  • 21.
    Data Analysis • CommonData Scientists Tools • R • Weka • Octave • Scikit-Learn • Common Data Scientists Languages • Python • Scala • F#
  • 23.
    Resources • https://www.coursera.org/ • DataScientist Specialization • https://www.khanacademy.org/ • Math • http://www.osservatori.net/business_intelligence • Italian Big Data Market Analysis Resources • http://www.solidq.com/consulting/ • Data Science Services • Big Data / Business Intelligence / Data Warehousing
  • 24.