Your SlideShare is downloading. ×
0
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Top 10 Challenges of Making Big Data Real and Tips to Overcome Them
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

2,927

Published on

This workshop presentation was given by Rich Dill, Solutions Engineer at SnapLogic at the GigaOm Structure Data Conference, March 20-21, 2013 in New York City, NY. …

This workshop presentation was given by Rich Dill, Solutions Engineer at SnapLogic at the GigaOm Structure Data Conference, March 20-21, 2013 in New York City, NY.

What are the Top Ten Challenges?

1. A miracle occurs here - Of course we can connect to it…
2. There is always more data than you expected - Unless there is not enough data to be meaningful
3. Never mistake a memo for reality - Did you hear what I said or what I meant?
4. It is logically impossible to schedule for the unknown
5. There is life beyond American English - Eventually you will have to deal with other languages
6. Of course the data is accurate, clean and ready - Data quality issues can kill project schedules
7. Dealing with unstructured data is fun - Somewhere buried inside is your delimiter where you least expect it
8. The data and process is subject to… Pick your acronym PCI, FIX, HIPAA, SOX
9. The requirements once defined are set in stone - Requirements almost always evolve
10. The most critical data will be on the most difficult platform to access - “a good deal of our case data is on Notes running on AS400”

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,927
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 1990sValuable data was being generated but was really living in silo’d environments. The term MDM was not even coined till 2003As long as you could connect different systems together via a nightly, or sometimes even a weekly feed, that was pretty darn awesome!Technologies like ESBs, EAIs, ETLs… flourished.Data was mostly structured. Sitting in RDBMS systems2000sNetwork speeds increasedCosts went downPlayers like Salesforce and NetSuite started getting traction from SMB marketImmense value on cost and agilityFlexibility of to subscribe vs. perpetual licenses2005: Consumer / Social dataFB, Twitter, LinkedIn, amazon.com consumer reviews…Humans generating massive amounts of preference data, likes and dislikes, Data was different: Non-relational unstructured. Real-time dataHuge volumes: PetabytesProviding immense value to the business on their customers2010: MachineRFID tags. Various other sensors, weblogs. ArcSight got bought out for $1.5B by HPMassive amounts of dataExabytesSplunk had a successful IPO last monthSnap LogicThese 4 sources create an Impendence mismatch!Good luck doing all of this with an ESB Structured vs. unstructuredStreaming vs. batchPetabytes and Exabytes vs. GigaBytesPull vs. pushHub and spokeUnprecedented opportunity & desire to use dataData silos (data fragmentation) unavoidableLegacy Apps, Cloud Apps, and Hadoop are driving thisDifferent locations, protocols, formats, and architecturesData is more distributed & less accessible (less useful)Compounding due to volume & variety of apps & dataESB is just another connectionEnterprises must share data between their appsCollect, combine, process data into valuable informationCompetitive advantage will become necessity for survivalsnapLogic = data sharing platform
  • Apple Like Model – we offer an API and about 200 SnapsBuild or BuyEasy to build w Java or Phython – An intern out of school built snaps in 4 daysBuild or Buy – Containerazation of accessAbstraction of the end point – so you do not need to know everything
  • Transcript

    1. Top 10 challenges of making big data real– and tips to overcome them Rich Dill Solutions Engineer, SnapLogic rdill@snaplogic.com
    2. A play on Dave Letterman’s top 10• 1. A miracle occurs here - Of course we can connect to it…• 2. There is always more data than you expected - Unless there is not enough data to be meaningful• 3. Never mistake a memo for reality - Did you hear what I said or what I meant?• 4. It is logically impossible to schedule for the unknown - Or the relationship between developers and weathermen• 5. There is life beyond American English - Eventually you will have to deal with other languages2
    3. A play on Dave Letterman’s top 10• 6. Of course the data is accurate, clean and ready - Data quality issues can kill project schedules• 7. Dealing with unstructured data is fun - Somewhere buried inside is your delimiter where you least expect it• 8. The data and process is subject to… - Pick your acronym PCI, FIX, HIPAA, SOX• 9. The requirements once defined are set in stone - Requirements almost always evolve• 10. The most critical data will be on the most difficult platform to access - “a good deal of our case data is on Notes running on AS400”3
    4. A miracle occurs here• Of course we can connect to it…4
    5. And we know the image resonates, v2…5
    6. SnapLogic Solution Users ESB RDBMS Data Center Mobile Enterprise Amazon Redshift Cloud Big Data
    7. There is always more data than you expected• Unless there is not enough data to be meaningful - It’s feast or famine - Distributed systems replicate data • At the site level and at the network level - 3x at the data center in Houston and 3x in Chicago - Replicated data can increase the cost of hardware, network and software - We are far from normal • Data is organized for performance and reliability not space efficiency7
    8. It is logically impossible to schedule for the unknown• Or my theory of the relationship between developers and weathermen• The accuracy of an estimate is a function of the number of variables and the length of the project8
    9. Never mistake a memo for reality• Did you hear what I said or what I meant?• Are you a literal listener? - Psycholinguistics should be required reading for project managers• Waterfall process - Allows you to build something the user wants today that you deliver in 9 months or two years• Iterative process - We’ll figure it out as we go along - Not really suited for deep architectural designs• Process - Listen - Process - Repeat back “this is what I heard you say”• Nothing beats showing a functioning prototype, demo or wireframe9
    10. There is life beyond American English• Eventually you will have to deal with other languages - German will test your user interface spacing - Cyrillic will add to the character set• Middle eastern languages - Read right to left - Some languages don’t have consistent spelling• Far eastern languages - There is no such thing as Chinese • Mandarin is the “Speech of Officials” • Cantonese is used in Hong Kong • Hangul is used in Korea • Japanese - Kanji is adopted Chinese characters - Kana is a combination of Hiragana & Katakana10
    11. Of course the data is accurate, clean and ready• How good is the data? - Profiling the data is key to accurate project estimates - What percentage of the data is null, blank, invalid?• Data lifecycle includes - Acquisition or creation - Validation • Business rules • Which may result in…• Data cleansing - Zip code tables, barcodes, D & B credit ratings - Public data resources: www.data.gov• Storage in an accessible format/location• Archiving - Industry or legal rules for archiving11
    12. Dealing with unstructured data is fun• Somewhere buried inside is your delimiter where you least expect it• Email is one of the most complex to handle• Hierarchal data structures must be mapped or navigated• XML is not the end all, be all of structure data formatting - JSON - BSON - SomethingImissedSON12
    13. Big Data Reference Architecture 1 2 3 Collect Translate & Enrich Distribute DBStructured Data DB Data View Unstructured Data
    14. The data and process is subject to…• Pick your acronym: PCI, FIX, HIPAA, SOX• Almost every industry has some form or another of data handling protocols that must be addressed• These protocols are a combination of - Data creation - Data access - Technology and workflow - It is not just encryption and access• Know your customers requirements!14
    15. The requirements once defined are set in stone • What your users know today is not what they will know tomorrow… • Requirements evolve • Why do you think they call them users? - If you are successful they will want more • Things change - Economy - Budgets - Timeframe - Management • Feature creep is not a bad thing if budgets and timelines also creep 15
    16. The most critical data will be on the most difficultplatform to access• “A good deal of our case data is on Notes running on AS400”• Discover where the data is first• When can you access it? - 24x7, after hours, on demand• Throughput is key - Either during business hours of afterwards• What conditions? - One time download - Scheduled - Event based - Stream• What about security requirements? - There is a performance impact of encryption during transmission16
    17. Containerization with Snaps BUY BUILD • SnapStore • SDK + API • Certified and supported • Java, Python by SnapLogic • Customer, Partner or SnapLogic
    18. The eleventh rule• Free software sometimes is worth the cost - Or the money you save on licenses is multiplied by the cost of training and consultants - In most cases labor is the one of the biggest costs of most software projects• Open source is NOT the same as free! - Subscription vs. perpetual licenses - Does the customer need to • Expense or capitalize software licenses18
    19. Thank youFor more informationwww.snaplogic.comBDaaS - BigData as a Service

    ×