Microsoft’s BigData Story            @LynnLangit                          April 2013 – Big Data Tech Con
Data Expertise / Lynn Langit• Industry awards   – Microsoft – MVP for SQL Server   – Google – GDE for Cloud Platform   – 1...
In a Relationship?  BigData            NoSQL
BigData, NoSQL… => No Microsoft?  Big Data => keeping / getting more data  • Cheap Storage  • Cloud Storage  • Open Source...
In a (Open Source) Relationship? NoSQL  Hadoop             Cloud MongoDB  Neo4j   Riak       AWS    Heroku   RackSpace   O...
Data ServicesDEMOHDINSIGHT (HADOOP)
The Reality              BigData                Small                BigData
BigData Lifecycle Management      Locate                Quantify      Qualify                    Replicate          Proces...
Locating the data                                      • you buy it                            Private                    ...
Finding Data in Data Markets•   Windows Azure Data Market•   DataMarket.com•   Factual.com•   InfoChimps
Data ServicesDEMOAZURE DATAMARKET
Database Lifecycle Management• Evaluating current processes• Improving processes• Adding new tools  – SSDT• Data synchroni...
Storing the data  Relational  • SQL Server – can use partitioning for scalability  Beyond relational via relational  • Spe...
Big Data in SQL Server 2012 – Relational EnhancementsDEMOCOLUMNSTORE, XML, FILETABLE
Data ProcessingRaw data           Pre-processed data                       Detail data                                    ...
Valuing the data•   De-duplicating•   Validating•   Correcting errors•   Aggregating•   Ranking / rating    – Social ratin...
Data ServicesDEMODATA QUALITY SERVICES
Types of Data Quality Projects T-SQL scripts (boolean        • Exact matches WHERE = , WHERE WHERE                        ...
Data Presentation•   View-only client•   View & manipulate (hide-only) client•   View & query (aggregate) client•   View &...
But, does it work in Excel?          Mash-up                       Clean up      Extract-   Authorize          data with  ...
From Pivot tables to Visualized Data Mash-ups with MiningDEMOTHE POWER OF EXCEL
What about the UDM?• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode  – Mutually exclusive to...
Big Data in SQL Server 2012– Non-Relational FeaturesDEMOTABULAR MODELSDATA MINING
Data Consumability (Accurate)   Valid                      (Meaningful)                                     Recognizable  ...
PowerView forTabular ModelsDEMOPOWERVIEW
Data Fluency and Job RolesConsumer       Analyzer       Cleaner        Artist• View and     • View,        • Validate     ...
BigData in SQL Server 2012                                   • Scaling via                                     • Partition...
Other Data Services from Microsoft           Windows            Azure       SQL Azure          Marketplace              Da...
NoSQL – New Products / Betas                            SSRS on           Semantic         Azure           Search         ...
Announced Futures
The Changing Data Landscape                               Other                              ServicesRDBMS         NoSQL
• recipes)    www.TeachingKidsProgramming.org      •   Free Courseware      •   Do a Recipe  Teach a Kid (Ages 10 ++)    ...
Toward Data Craftsmanship…                 Follow me                 • @LynnLangit                 • www.LynnLangit.com   ...
The Microsoft BigData Story
Upcoming SlideShare
Loading in...5
×

The Microsoft BigData Story

3,564

Published on

deck from my talk at Big Data Tech Con in Boston April 2013

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,564
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
103
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • Comparison of features from MSDN -- http://msdn.microsoft.com/en-us/library/hh212940(v=sql.110).aspx
  • Lynn
  • The Microsoft BigData Story

    1. 1. Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con
    2. 2. Data Expertise / Lynn Langit• Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB• Practicing Architect• Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI• Former MSFT FTE – 4 years
    3. 3. In a Relationship? BigData NoSQL
    4. 4. BigData, NoSQL… => No Microsoft? Big Data => keeping / getting more data • Cheap Storage • Cloud Storage • Open Source data projects (Hadoop) NoSQL => schema-lite, scalable storage • NoSQL data projects • Mostly open source • Sharded replicas
    5. 5. In a (Open Source) Relationship? NoSQL Hadoop Cloud MongoDB Neo4j Riak AWS Heroku RackSpace OpenStack Cassandra
    6. 6. Data ServicesDEMOHDINSIGHT (HADOOP)
    7. 7. The Reality BigData Small BigData
    8. 8. BigData Lifecycle Management Locate Quantify Qualify Replicate Process Present
    9. 9. Locating the data • you buy it Private source Public source • you find it Your source • in SQL Server • on desktops
    10. 10. Finding Data in Data Markets• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps
    11. 11. Data ServicesDEMOAZURE DATAMARKET
    12. 12. Database Lifecycle Management• Evaluating current processes• Improving processes• Adding new tools – SSDT• Data synchronization processes
    13. 13. Storing the data Relational • SQL Server – can use partitioning for scalability Beyond relational via relational • Specialized data types • XML, Hierarchy, Filestream/Filetable, Geospatial • Columnstore index Multi-dimensional / in-memory • OLAP cubes / Mining Models • Tabular models
    14. 14. Big Data in SQL Server 2012 – Relational EnhancementsDEMOCOLUMNSTORE, XML, FILETABLE
    15. 15. Data ProcessingRaw data Pre-processed data Detail data Aggregate data Views
    16. 16. Valuing the data• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating – Social rating ,i.e. Yelp-like – Social scoring, i.e. Freebase-like
    17. 17. Data ServicesDEMODATA QUALITY SERVICES
    18. 18. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches WHERE = , WHERE WHERE <>, IN match) • LIKE string matching % -- Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE(semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - /MDS (One view of truth) • Versioned Entities, Attributes and Rules
    19. 19. Data Presentation• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data
    20. 20. But, does it work in Excel? Mash-up Clean up Extract- Authorize data with data with Transform- with 3rd party –Import PowerPivot Data Load with Master Mine with Data – including Quality Data Data Predixion Hadoop via Services Explorer Services ODBC
    21. 21. From Pivot tables to Visualized Data Mash-ups with MiningDEMOTHE POWER OF EXCEL
    22. 22. What about the UDM?• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode – Mutually exclusive to Tabular mode• But, should you use it anymore?
    23. 23. Big Data in SQL Server 2012– Non-Relational FeaturesDEMOTABULAR MODELSDATA MINING
    24. 24. Data Consumability (Accurate) Valid (Meaningful) Recognizable (Useful) Appropriate (Appealing) Beautiful (Satisfying) Enjoyable
    25. 25. PowerView forTabular ModelsDEMOPOWERVIEW
    26. 26. Data Fluency and Job RolesConsumer Analyzer Cleaner Artist• View and • View, • Validate • Visualize understand manipulate and update and present and decide
    27. 27. BigData in SQL Server 2012 • Scaling via • Partitioning for Tables, indexes • PDW Relational • Columnstore indexes engine • Special Data Types • XML, Hierarchy, Filetable • OLAP Cubes Analysis • Tabular Models service engines • Data Mining Models • Data Quality Services Other • Master Data Services services • StreamInsight
    28. 28. Other Data Services from Microsoft Windows Azure SQL Azure Marketplace Data Power Pivot Explorer
    29. 29. NoSQL – New Products / Betas SSRS on Semantic Azure Search HDInsight PowerView (Hadoop on Azure) Cloud-based Data Explorer
    30. 30. Announced Futures
    31. 31. The Changing Data Landscape Other ServicesRDBMS NoSQL
    32. 32. • recipes) www.TeachingKidsProgramming.org • Free Courseware • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic • C# on Pluralsight
    33. 33. Toward Data Craftsmanship… Follow me • @LynnLangit • www.LynnLangit.com • YouTube - SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×