The Microsoft BigData Story
Upcoming SlideShare
Loading in...5
×
 

The Microsoft BigData Story

on

  • 4,164 views

deck from my talk at Big Data Tech Con in Boston April 2013

deck from my talk at Big Data Tech Con in Boston April 2013

Statistics

Views

Total Views
4,164
Views on SlideShare
1,654
Embed Views
2,510

Actions

Likes
3
Downloads
98
Comments
0

6 Embeds 2,510

http://lynnlangit.com 2503
http://dev.newsblur.com 2
http://reader.aol.com 2
http://newsblur.com 1
http://lynnlangit.com. 1
http://www.newsblur.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • Comparison of features from MSDN -- http://msdn.microsoft.com/en-us/library/hh212940(v=sql.110).aspx
  • Lynn

The Microsoft BigData Story The Microsoft BigData Story Presentation Transcript

  • Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con
  • Data Expertise / Lynn Langit• Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB• Practicing Architect• Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI• Former MSFT FTE – 4 years
  • In a Relationship? BigData NoSQL View slide
  • BigData, NoSQL… => No Microsoft? Big Data => keeping / getting more data • Cheap Storage • Cloud Storage • Open Source data projects (Hadoop) NoSQL => schema-lite, scalable storage • NoSQL data projects • Mostly open source • Sharded replicas View slide
  • In a (Open Source) Relationship? NoSQL Hadoop Cloud MongoDB Neo4j Riak AWS Heroku RackSpace OpenStack Cassandra
  • Data ServicesDEMOHDINSIGHT (HADOOP)
  • The Reality BigData Small BigData
  • BigData Lifecycle Management Locate Quantify Qualify Replicate Process Present
  • Locating the data • you buy it Private source Public source • you find it Your source • in SQL Server • on desktops
  • Finding Data in Data Markets• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps
  • Data ServicesDEMOAZURE DATAMARKET
  • Database Lifecycle Management• Evaluating current processes• Improving processes• Adding new tools – SSDT• Data synchronization processes
  • Storing the data Relational • SQL Server – can use partitioning for scalability Beyond relational via relational • Specialized data types • XML, Hierarchy, Filestream/Filetable, Geospatial • Columnstore index Multi-dimensional / in-memory • OLAP cubes / Mining Models • Tabular models
  • Big Data in SQL Server 2012 – Relational EnhancementsDEMOCOLUMNSTORE, XML, FILETABLE
  • Data ProcessingRaw data Pre-processed data Detail data Aggregate data Views
  • Valuing the data• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating – Social rating ,i.e. Yelp-like – Social scoring, i.e. Freebase-like
  • Data ServicesDEMODATA QUALITY SERVICES
  • Types of Data Quality Projects T-SQL scripts (boolean • Exact matches WHERE = , WHERE WHERE <>, IN match) • LIKE string matching % -- Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE(semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - /MDS (One view of truth) • Versioned Entities, Attributes and Rules
  • Data Presentation• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data
  • But, does it work in Excel? Mash-up Clean up Extract- Authorize data with data with Transform- with 3rd party –Import PowerPivot Data Load with Master Mine with Data – including Quality Data Data Predixion Hadoop via Services Explorer Services ODBC
  • From Pivot tables to Visualized Data Mash-ups with MiningDEMOTHE POWER OF EXCEL
  • What about the UDM?• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode – Mutually exclusive to Tabular mode• But, should you use it anymore?
  • Big Data in SQL Server 2012– Non-Relational FeaturesDEMOTABULAR MODELSDATA MINING
  • Data Consumability (Accurate) Valid (Meaningful) Recognizable (Useful) Appropriate (Appealing) Beautiful (Satisfying) Enjoyable
  • PowerView forTabular ModelsDEMOPOWERVIEW
  • Data Fluency and Job RolesConsumer Analyzer Cleaner Artist• View and • View, • Validate • Visualize understand manipulate and update and present and decide
  • BigData in SQL Server 2012 • Scaling via • Partitioning for Tables, indexes • PDW Relational • Columnstore indexes engine • Special Data Types • XML, Hierarchy, Filetable • OLAP Cubes Analysis • Tabular Models service engines • Data Mining Models • Data Quality Services Other • Master Data Services services • StreamInsight
  • Other Data Services from Microsoft Windows Azure SQL Azure Marketplace Data Power Pivot Explorer
  • NoSQL – New Products / Betas SSRS on Semantic Azure Search HDInsight PowerView (Hadoop on Azure) Cloud-based Data Explorer
  • Announced Futures
  • The Changing Data Landscape Other ServicesRDBMS NoSQL
  • • recipes) www.TeachingKidsProgramming.org • Free Courseware • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic • C# on Pluralsight
  • Toward Data Craftsmanship… Follow me • @LynnLangit • www.LynnLangit.com • YouTube - SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions