The Microsoft BigData Story

3,898 views
3,729 views

Published on

deck from my talk at Big Data Tech Con in Boston April 2013

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,898
On SlideShare
0
From Embeds
0
Number of Embeds
2,593
Actions
Shares
0
Downloads
106
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • Comparison of features from MSDN -- http://msdn.microsoft.com/en-us/library/hh212940(v=sql.110).aspx
  • Lynn
  • The Microsoft BigData Story

    1. 1. Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con
    2. 2. Data Expertise / Lynn Langit• Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB• Practicing Architect• Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI• Former MSFT FTE – 4 years
    3. 3. In a Relationship? BigData NoSQL
    4. 4. BigData, NoSQL… => No Microsoft? Big Data => keeping / getting more data • Cheap Storage • Cloud Storage • Open Source data projects (Hadoop) NoSQL => schema-lite, scalable storage • NoSQL data projects • Mostly open source • Sharded replicas
    5. 5. In a (Open Source) Relationship? NoSQL Hadoop Cloud MongoDB Neo4j Riak AWS Heroku RackSpace OpenStack Cassandra
    6. 6. Data ServicesDEMOHDINSIGHT (HADOOP)
    7. 7. The Reality BigData Small BigData
    8. 8. BigData Lifecycle Management Locate Quantify Qualify Replicate Process Present
    9. 9. Locating the data • you buy it Private source Public source • you find it Your source • in SQL Server • on desktops
    10. 10. Finding Data in Data Markets• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps
    11. 11. Data ServicesDEMOAZURE DATAMARKET
    12. 12. Database Lifecycle Management• Evaluating current processes• Improving processes• Adding new tools – SSDT• Data synchronization processes
    13. 13. Storing the data Relational • SQL Server – can use partitioning for scalability Beyond relational via relational • Specialized data types • XML, Hierarchy, Filestream/Filetable, Geospatial • Columnstore index Multi-dimensional / in-memory • OLAP cubes / Mining Models • Tabular models
    14. 14. Big Data in SQL Server 2012 – Relational EnhancementsDEMOCOLUMNSTORE, XML, FILETABLE
    15. 15. Data ProcessingRaw data Pre-processed data Detail data Aggregate data Views
    16. 16. Valuing the data• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating – Social rating ,i.e. Yelp-like – Social scoring, i.e. Freebase-like
    17. 17. Data ServicesDEMODATA QUALITY SERVICES
    18. 18. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches WHERE = , WHERE WHERE <>, IN match) • LIKE string matching % -- Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE(semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - /MDS (One view of truth) • Versioned Entities, Attributes and Rules
    19. 19. Data Presentation• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data
    20. 20. But, does it work in Excel? Mash-up Clean up Extract- Authorize data with data with Transform- with 3rd party –Import PowerPivot Data Load with Master Mine with Data – including Quality Data Data Predixion Hadoop via Services Explorer Services ODBC
    21. 21. From Pivot tables to Visualized Data Mash-ups with MiningDEMOTHE POWER OF EXCEL
    22. 22. What about the UDM?• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode – Mutually exclusive to Tabular mode• But, should you use it anymore?
    23. 23. Big Data in SQL Server 2012– Non-Relational FeaturesDEMOTABULAR MODELSDATA MINING
    24. 24. Data Consumability (Accurate) Valid (Meaningful) Recognizable (Useful) Appropriate (Appealing) Beautiful (Satisfying) Enjoyable
    25. 25. PowerView forTabular ModelsDEMOPOWERVIEW
    26. 26. Data Fluency and Job RolesConsumer Analyzer Cleaner Artist• View and • View, • Validate • Visualize understand manipulate and update and present and decide
    27. 27. BigData in SQL Server 2012 • Scaling via • Partitioning for Tables, indexes • PDW Relational • Columnstore indexes engine • Special Data Types • XML, Hierarchy, Filetable • OLAP Cubes Analysis • Tabular Models service engines • Data Mining Models • Data Quality Services Other • Master Data Services services • StreamInsight
    28. 28. Other Data Services from Microsoft Windows Azure SQL Azure Marketplace Data Power Pivot Explorer
    29. 29. NoSQL – New Products / Betas SSRS on Semantic Azure Search HDInsight PowerView (Hadoop on Azure) Cloud-based Data Explorer
    30. 30. Announced Futures
    31. 31. The Changing Data Landscape Other ServicesRDBMS NoSQL
    32. 32. • recipes) www.TeachingKidsProgramming.org • Free Courseware • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic • C# on Pluralsight
    33. 33. Toward Data Craftsmanship… Follow me • @LynnLangit • www.LynnLangit.com • YouTube - SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions

    ×