3. SolidQ
• Born in 2002 in USA and Spain
• Established in 2007 in Italy
• More than 1000 customers and more than 200 consultants worldwide
• Dedicated to Data Management on the Microsoft Platform
• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors
• www.solidq.com
4. Davide Mauri
• 18 Years of experience on the SQL Server Platform
• Specialized in Data Solution Architecture, Database Design, Performance
Tuning, Business Intelligence
• Microsoft SQL Server MVP
• President of UGISS (Italian SQL Server UG)
• Mentor @ SolidQ
• Video, Book & Article Author
• Regular Speaker @ SQL Server events
• Projects, Consulting, Mentoring & Training
6. “Companies are collecting
mountains of information about
you, to predict how
likely you are to buy a product,
and using that knowledge to
craft a marketing message
precisely calibrated to get you to
do so”
7. Data Science
• Extraction of knowledge from data
• So, what’s new?
• Nothing. Except that it’s now economic and fast.
• It’s now applicable to everything. And we have a lot of data produced everyday
that can be used to extract knowledge
9. Data Science
• A Sum Of
• Statistics
• Mathematics
• Machine Learning
• Data Mining
• Computer Programming
• Data Engineering
• Visualization
• Data Warehousing
• High Performance Computing
• To support (Informed) Decision Making
• Data-Driven Decisions
10. Data Scientist
• IBM
• A data scientist represents an evolution from the business or data analyst role.
• The formal training is similar, with a solid foundation typically in computer science and
applications, modeling, statistics, analytics and math.
• What sets the data scientist apart is strong business acumen, coupled with the ability to
communicate findings to both business and IT leaders in a way that can influence how
an organization approaches a business challenge.
• It's almost like a Renaissance individual who really wants to learn and bring change to
an organization.
11. • Algorithms are the new gatekeepers
• They decided
• What we find
• What we see
• What we buy
15. Big Data
• Volume, Velocity, Variety, Veracity….V<your-v-here>
• Data sets with sizes beyond the ability of commonly used software tools
to capture, curate, manage, and process the data within a tolerable elapsed
time
• Grid Computing, Parallel Computing needed
• keep processing time reasonable
• provide scalability
16. Big Data Data
• Paradigm: “Store Now, Figure Out Later”
• Data is the new resource. Never throw it away!
• Unstructured Data
• Text Files
• Images
• Sounds
• Structured/Semi Structured Data
• Sensors
• Transactions
• Logs
17. Data Storage
• RDBMS
• SQL Server
• Hadoop
• HDInsight
• Hortonworks Data Platform
• Distributed File (Eco)System
• CSV
• JSON
• *.*
18. Data Storage
• Hadoop Ecosystem
http://hortonworks.com/hadoop-modern-data-architecture/
19. Data Science & Big Data
• Data Science != Big Data
• Data Science Not Only on Big Data
• Data Science can be applied to Big Data
• Data Science starts from Small Data
• 1) find the algorithm that extract knowledge
• 2) measure algorithm results and in terms of probability
20. Machine Learning
• Machine learning, a branch of artificial intelligence, concerns the construction
and study of systems that can learn from data. (Wikipedia)
• For example, a machine learning system could be trained on email messages to learn to
distinguish between spam and non-spam messages. After learning, it can then be used
to classify new email messages into spam and non-spam folders.
• Flavors
• Supervised
• Unsupervised
21. Data Analysis
• Common Data Scientists Tools
• R
• Weka
• Octave
• Scikit-Learn
• Common Data Scientists Languages
• Python
• Scala
• F#
22.
23. Resources
• https://www.coursera.org/
• Data Scientist Specialization
• https://www.khanacademy.org/
• Math
• http://www.osservatori.net/business_intelligence
• Italian Big Data Market Analysis Resources
• http://www.solidq.com/consulting/
• Data Science Services
• Big Data / Business Intelligence / Data Warehousing