The Microsoft BigData Story
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

The Microsoft BigData Story

  • 4,266 views
Uploaded on

deck from my talk at Big Data Tech Con in Boston April 2013

deck from my talk at Big Data Tech Con in Boston April 2013

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,266
On Slideshare
1,747
From Embeds
2,519
Number of Embeds
6

Actions

Shares
Downloads
98
Comments
0
Likes
3

Embeds 2,519

http://lynnlangit.com 2,512
http://dev.newsblur.com 2
http://reader.aol.com 2
http://newsblur.com 1
http://lynnlangit.com. 1
http://www.newsblur.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • Comparison of features from MSDN -- http://msdn.microsoft.com/en-us/library/hh212940(v=sql.110).aspx
  • Lynn

Transcript

  • 1. Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con
  • 2. Data Expertise / Lynn Langit• Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB• Practicing Architect• Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI• Former MSFT FTE – 4 years
  • 3. In a Relationship? BigData NoSQL
  • 4. BigData, NoSQL… => No Microsoft? Big Data => keeping / getting more data • Cheap Storage • Cloud Storage • Open Source data projects (Hadoop) NoSQL => schema-lite, scalable storage • NoSQL data projects • Mostly open source • Sharded replicas
  • 5. In a (Open Source) Relationship? NoSQL Hadoop Cloud MongoDB Neo4j Riak AWS Heroku RackSpace OpenStack Cassandra
  • 6. Data ServicesDEMOHDINSIGHT (HADOOP)
  • 7. The Reality BigData Small BigData
  • 8. BigData Lifecycle Management Locate Quantify Qualify Replicate Process Present
  • 9. Locating the data • you buy it Private source Public source • you find it Your source • in SQL Server • on desktops
  • 10. Finding Data in Data Markets• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps
  • 11. Data ServicesDEMOAZURE DATAMARKET
  • 12. Database Lifecycle Management• Evaluating current processes• Improving processes• Adding new tools – SSDT• Data synchronization processes
  • 13. Storing the data Relational • SQL Server – can use partitioning for scalability Beyond relational via relational • Specialized data types • XML, Hierarchy, Filestream/Filetable, Geospatial • Columnstore index Multi-dimensional / in-memory • OLAP cubes / Mining Models • Tabular models
  • 14. Big Data in SQL Server 2012 – Relational EnhancementsDEMOCOLUMNSTORE, XML, FILETABLE
  • 15. Data ProcessingRaw data Pre-processed data Detail data Aggregate data Views
  • 16. Valuing the data• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating – Social rating ,i.e. Yelp-like – Social scoring, i.e. Freebase-like
  • 17. Data ServicesDEMODATA QUALITY SERVICES
  • 18. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches WHERE = , WHERE WHERE <>, IN match) • LIKE string matching % -- Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE(semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - /MDS (One view of truth) • Versioned Entities, Attributes and Rules
  • 19. Data Presentation• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data
  • 20. But, does it work in Excel? Mash-up Clean up Extract- Authorize data with data with Transform- with 3rd party –Import PowerPivot Data Load with Master Mine with Data – including Quality Data Data Predixion Hadoop via Services Explorer Services ODBC
  • 21. From Pivot tables to Visualized Data Mash-ups with MiningDEMOTHE POWER OF EXCEL
  • 22. What about the UDM?• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode – Mutually exclusive to Tabular mode• But, should you use it anymore?
  • 23. Big Data in SQL Server 2012– Non-Relational FeaturesDEMOTABULAR MODELSDATA MINING
  • 24. Data Consumability (Accurate) Valid (Meaningful) Recognizable (Useful) Appropriate (Appealing) Beautiful (Satisfying) Enjoyable
  • 25. PowerView forTabular ModelsDEMOPOWERVIEW
  • 26. Data Fluency and Job RolesConsumer Analyzer Cleaner Artist• View and • View, • Validate • Visualize understand manipulate and update and present and decide
  • 27. BigData in SQL Server 2012 • Scaling via • Partitioning for Tables, indexes • PDW Relational • Columnstore indexes engine • Special Data Types • XML, Hierarchy, Filetable • OLAP Cubes Analysis • Tabular Models service engines • Data Mining Models • Data Quality Services Other • Master Data Services services • StreamInsight
  • 28. Other Data Services from Microsoft Windows Azure SQL Azure Marketplace Data Power Pivot Explorer
  • 29. NoSQL – New Products / Betas SSRS on Semantic Azure Search HDInsight PowerView (Hadoop on Azure) Cloud-based Data Explorer
  • 30. Announced Futures
  • 31. The Changing Data Landscape Other ServicesRDBMS NoSQL
  • 32. • recipes) www.TeachingKidsProgramming.org • Free Courseware • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic • C# on Pluralsight
  • 33. Toward Data Craftsmanship… Follow me • @LynnLangit • www.LynnLangit.com • YouTube - SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions