© 2015 IBM Corporation
Bring Data Analytics to the Edge
Ton Machielsen - Customer Technical Professional
Cloud Data Services – IBM Digital Sales
© 2015 IBM Corporation2
Powerful DBaaS For apps that need:
• Elastic scalability
• High availability
• Data model flexibility
• Data mobility
• Text search
• Geospatial
Available as:
• Fully managed DBaaS
• On-premises private cloud
• Hybrid architecture
With custom coding that enables unique data
needs to be developed in days, not weeks
Cloudant – NoSQL Database as a Service
© 2015 IBM Corporation3
dashDB – Analytics Warehouse as a Service
For apps that need:
• Elastic scalability
• High availability
• Data model flexibility
• Data mobility
• Text search
• Geospatial
Available as:
• Fully managed DBaaS
• On-premises private cloud
• Hybrid architecture
 Announced in October 2014
 DB2 BLU columnar technology +
Netezza in-database analytics
• Rapid deployment of large scale
data warehouses
• Flexible options for both volume and
processing speed
• Unified architecture that enables hybrid
data processing on-premises & cloud
BLU
Acceleration
Netezza
In-Database
Analytics
Cloudant
Database
as a Service
© 2015 IBM Corporation4
Fully Automated
Intelligence
Natural
Language
Dialogue
Guided Analytic
Discovery
Single Analytics
Experience
IBM Watson Analytics
Self-service analytics capabilities in the cloud
© 2015 IBM Corporation5
Use Case / What we will see today
Management of a ficticious Airline company XYZ inotices a stagnation in growth.
Marketing is asked to conduct a Customer Satisfaction Survey to investigate which
areas to improve.
1.Survey results are delivered to be analized
2.Survey data is imported in Cloudant for storage
3.Cloudant data is exported to DashDB for analysis
4.Watson Analytics is used to perform the analysis on the data
© 2015 IBM Corporation6
Survey data delivered as CSV file
© 2015 IBM Corporation7
CSV file gets imported into Cloudant
python csv-import.py -f <importfile.csv>-u <username> -d <database name>
© 2015 IBM Corporation8
Sample JSON document in Cloudant
8
{
"_id": "0940405adf4b22beab341d90e4000034",
"_rev": "1-a727ce80b1e01d1e7b8365874d4f64a8",
"Origin State": "Texas",
"Airline Code": "AA",
"Type of Travel": "Business travel",
"Shopping Amount at Airport": "0",
"Airline Name": "Paul Smith Airlines Inc. ",
"Scheduled Departure Hour": "8",
"No of Flights p.a.": "8",
"Airline Status": "Blue",
"Flight Distance": "1172",
"Orgin City": "Dallas/Fort Worth, TX",
"Price Sensitivity": "1",
"Arrival Delay greater 5 Mins": "no",
"Class": "Eco Plus",
"Arrival Delay in Minutes": "0",
"Gender": "Male",
"Age": "40",
"Flight cancelled": "No",
"No of Flights p.a. grouped": "1 to 10",
"% of Flight with other Airlines": "19",
"Departure Delay in Minutes": "5",
"Year of First Flight": "2003",
"Flight date": "02-14-2014",
"Age Range": "40-49",
"Destination State": "Virginia",
"No. of other Loyalty Cards": "2",
"Satisfaction": "4.0",
"Day of Month": "14",
"Destination City": "Washington, DC",
"Flight time in minutes": "140",
"Eating and Drinking at Airport": "60"
}
© 2015 IBM Corporation9
Integration between Cloudant and DashDB
Create Datawarehouse from Cloudant Dashboard.
Datawarehouse is automatically created on IBM
Bluemix using an automatically generated DashDB
instance.
Automatic Schema Discovery Process models the
NoSQL database into a relational database
structure and data is loaded into the DashDB
database.
© 2015 IBM Corporation10
Database structure in DashDB
© 2015 IBM Corporation11
Migrating Cloudant JSON into dashDB
 Cloudant’s Schema Discovery Process (SDP) translates JSON documents into a schema
(or set of tables) that dashDB understands
© 2015 IBM Corporation12
Sample JSON document in Cloudant
{
"_id": "019b716168d45be2c2bd8371d4000d5c",
"_rev": "1-4b4994f0dd1ddc56b96bd2a16cb080e7",
"Received": "2010-10-18T11:36:10.057",
"Period": "3rd Quarter (July 1 - Sep 30)",
"Amount": "20000",
"Client": {
"ContactFullname": "JOHN MARK TRUMBORE",
"ClientPPBCountry": "USA",
"ClientID": "1000634",
"GeneralDescription": "Non-profit organization dedicated to restoring the Chesapeake Bay and its tributary rivers",
"ClientPPBState": "DISTRICT OF COLUMBIA",
"ClientCountry": "USA",
"IsStateOrLocalGov": "0",
"ClientState": "DISTRICT OF COLUMBIA",
"ClientName": "CHESAPEAKE BAY FOUNDATION",
"ClientStatus": "1"
},
"GovernmentEntities": [
{
"GovEntityName": "HOUSE OF REPRESENTATIVES"
}
],
"Lobbyists": [
{
"LobbyistName": "BORSKI, ROBERT A JR",
"LobbyisteIndicator": "0",
"LobbyistStatus": "0",
"OfficialPosition": "Member of Congress (1983-2003)"
},
{
"LobbyistName": "TRUMBORE, JOHN MARK",
"LobbyisteIndicator": "0",
"LobbyistStatus": "0",
"OfficialPosition": "Leg Asst (Rep. McHale), Leg Dir (Rep. R. Brady)"
}
],
"Year": "2010",
"Type": "THIRD QUARTER REPORT",
"Registrant": {
"RegistrantPPBCountry": "USA",
"RegistrantName": "Borski Associates",
"RegistrantCountry": "USA",
"RegistrantID": "84376",
"GeneralDescription": "Government Relations Consulting",
"Address": "4015 Fitler StreetrnPhiladelphia, PA 19114"
},
"Issues": [
{
"Code": "ENVIRONMENT/SUPERFUND",
"SpecificIssue": "Chesapeake Clean Water and Ecosystem Restoration Act (HR 3852)nChesapeake Bay Program Reauthorization and Improvement Act (HR 5509)"
}
]
}
© 2015 IBM Corporation13
Database structure in DashDB
LOBBY-SEARCH
• _ID
• _REV
• AMOUNT
• CLIENT_CLIENTCOUNTRY
• CLIENT_CLIENTID
• CLIENT_CLIENTNAME
• CLIENT_CLIENTPPBCOUNTRY
• CLIENT_CLIENTPPDSTATE
. . .
. . .
LOBBY-SEARCH_AFFILIATEDORGS
• ARRAY_INDEX
• AFFILIATEDORGCOUNTRY
• AFFILIATEDORGNAME
• AFFILIATEDORGPPBCCOUNTRY
• _ID
LOBBY-SEARCH_FOREIGNENTITIES
• ARRAY_INDEX
• FOREIGNENTITYCONTRIBUTION
• FOREIGNENTITYCOUNTRY
• FOREIGNENTITYNAME
• FOREIGNENTITYOWNERSHIPPERCENTAGE
• FOREIGNENTITYPPBCOUNTRY
• FOREIGNENTITYSTATUS
• _ID
LOBBY-SEARCH_ISSUES
• ARRAY_INDEX
• CODE
• SPECIFICISSUE
• _ID
LOBBY-SEARCH_OVERFLOW
• EXCEPTION
• WARNING
• _ID
LOBBY-SEARCH_GOVERNMENTENTITIES
• ARRAY_INDEX
• GOVENTITYNAME
• _ID
LOBBY-SEARCH_LOBBYISTS
• ARRAY_INDEX
• LOBBYISTINDICATOR
• LOBBYISTNAME
• LOBBYISTSTATUS
• OFFICIALPOSITION
• _ID
LOBBY-SEARCH_GOVERNMENTENTITIES
• ARRAY_INDEX
• GOVENTITYNAME
• _ID
Keys with array-values are stored in separate tables.
"Lobbyists": [
{
"LobbyistName": "BORSKI, ROBERT A JR",
"LobbyisteIndicator": "0",
"LobbyistStatus": "0",
"OfficialPosition": "Member of Congress (1983-2003)"
},
{
"LobbyistName": "TRUMBORE, JOHN MARK",
"LobbyisteIndicator": "0",
"LobbyistStatus": "0",
"OfficialPosition": "Leg Asst (Rep. McHale), Leg Dir (Rep. R. Brady)"
}
],
"Year": "2010",
© 2015 IBM Corporation14
Watson Analytics connection into DashDB
© 2015 IBM Corporation15
Datasets in Watson Analytics
© 2015 IBM Corporation16
Analysis in Watson Analytics
© 2015 IBM Corporation17
Analysis in Watson Analytics
© 2015 IBM Corporation18
Information delivery from Watson Analytics
© 2015 IBM Corporation19
Q&A and more info
Cloudant – http://www.cloudant.com
DashDB – http://www.dashdb.com
Watson Analytics – http://www.watsonanalytics.com
Bluemix – http://www.bluemix.net
Me! – http://ibm.biz/ton_machielsen
© 2015 IBM Corporation20

Bringing Data Analytics to the Edge

  • 1.
    © 2015 IBMCorporation Bring Data Analytics to the Edge Ton Machielsen - Customer Technical Professional Cloud Data Services – IBM Digital Sales
  • 2.
    © 2015 IBMCorporation2 Powerful DBaaS For apps that need: • Elastic scalability • High availability • Data model flexibility • Data mobility • Text search • Geospatial Available as: • Fully managed DBaaS • On-premises private cloud • Hybrid architecture With custom coding that enables unique data needs to be developed in days, not weeks Cloudant – NoSQL Database as a Service
  • 3.
    © 2015 IBMCorporation3 dashDB – Analytics Warehouse as a Service For apps that need: • Elastic scalability • High availability • Data model flexibility • Data mobility • Text search • Geospatial Available as: • Fully managed DBaaS • On-premises private cloud • Hybrid architecture  Announced in October 2014  DB2 BLU columnar technology + Netezza in-database analytics • Rapid deployment of large scale data warehouses • Flexible options for both volume and processing speed • Unified architecture that enables hybrid data processing on-premises & cloud BLU Acceleration Netezza In-Database Analytics Cloudant Database as a Service
  • 4.
    © 2015 IBMCorporation4 Fully Automated Intelligence Natural Language Dialogue Guided Analytic Discovery Single Analytics Experience IBM Watson Analytics Self-service analytics capabilities in the cloud
  • 5.
    © 2015 IBMCorporation5 Use Case / What we will see today Management of a ficticious Airline company XYZ inotices a stagnation in growth. Marketing is asked to conduct a Customer Satisfaction Survey to investigate which areas to improve. 1.Survey results are delivered to be analized 2.Survey data is imported in Cloudant for storage 3.Cloudant data is exported to DashDB for analysis 4.Watson Analytics is used to perform the analysis on the data
  • 6.
    © 2015 IBMCorporation6 Survey data delivered as CSV file
  • 7.
    © 2015 IBMCorporation7 CSV file gets imported into Cloudant python csv-import.py -f <importfile.csv>-u <username> -d <database name>
  • 8.
    © 2015 IBMCorporation8 Sample JSON document in Cloudant 8 { "_id": "0940405adf4b22beab341d90e4000034", "_rev": "1-a727ce80b1e01d1e7b8365874d4f64a8", "Origin State": "Texas", "Airline Code": "AA", "Type of Travel": "Business travel", "Shopping Amount at Airport": "0", "Airline Name": "Paul Smith Airlines Inc. ", "Scheduled Departure Hour": "8", "No of Flights p.a.": "8", "Airline Status": "Blue", "Flight Distance": "1172", "Orgin City": "Dallas/Fort Worth, TX", "Price Sensitivity": "1", "Arrival Delay greater 5 Mins": "no", "Class": "Eco Plus", "Arrival Delay in Minutes": "0", "Gender": "Male", "Age": "40", "Flight cancelled": "No", "No of Flights p.a. grouped": "1 to 10", "% of Flight with other Airlines": "19", "Departure Delay in Minutes": "5", "Year of First Flight": "2003", "Flight date": "02-14-2014", "Age Range": "40-49", "Destination State": "Virginia", "No. of other Loyalty Cards": "2", "Satisfaction": "4.0", "Day of Month": "14", "Destination City": "Washington, DC", "Flight time in minutes": "140", "Eating and Drinking at Airport": "60" }
  • 9.
    © 2015 IBMCorporation9 Integration between Cloudant and DashDB Create Datawarehouse from Cloudant Dashboard. Datawarehouse is automatically created on IBM Bluemix using an automatically generated DashDB instance. Automatic Schema Discovery Process models the NoSQL database into a relational database structure and data is loaded into the DashDB database.
  • 10.
    © 2015 IBMCorporation10 Database structure in DashDB
  • 11.
    © 2015 IBMCorporation11 Migrating Cloudant JSON into dashDB  Cloudant’s Schema Discovery Process (SDP) translates JSON documents into a schema (or set of tables) that dashDB understands
  • 12.
    © 2015 IBMCorporation12 Sample JSON document in Cloudant { "_id": "019b716168d45be2c2bd8371d4000d5c", "_rev": "1-4b4994f0dd1ddc56b96bd2a16cb080e7", "Received": "2010-10-18T11:36:10.057", "Period": "3rd Quarter (July 1 - Sep 30)", "Amount": "20000", "Client": { "ContactFullname": "JOHN MARK TRUMBORE", "ClientPPBCountry": "USA", "ClientID": "1000634", "GeneralDescription": "Non-profit organization dedicated to restoring the Chesapeake Bay and its tributary rivers", "ClientPPBState": "DISTRICT OF COLUMBIA", "ClientCountry": "USA", "IsStateOrLocalGov": "0", "ClientState": "DISTRICT OF COLUMBIA", "ClientName": "CHESAPEAKE BAY FOUNDATION", "ClientStatus": "1" }, "GovernmentEntities": [ { "GovEntityName": "HOUSE OF REPRESENTATIVES" } ], "Lobbyists": [ { "LobbyistName": "BORSKI, ROBERT A JR", "LobbyisteIndicator": "0", "LobbyistStatus": "0", "OfficialPosition": "Member of Congress (1983-2003)" }, { "LobbyistName": "TRUMBORE, JOHN MARK", "LobbyisteIndicator": "0", "LobbyistStatus": "0", "OfficialPosition": "Leg Asst (Rep. McHale), Leg Dir (Rep. R. Brady)" } ], "Year": "2010", "Type": "THIRD QUARTER REPORT", "Registrant": { "RegistrantPPBCountry": "USA", "RegistrantName": "Borski Associates", "RegistrantCountry": "USA", "RegistrantID": "84376", "GeneralDescription": "Government Relations Consulting", "Address": "4015 Fitler StreetrnPhiladelphia, PA 19114" }, "Issues": [ { "Code": "ENVIRONMENT/SUPERFUND", "SpecificIssue": "Chesapeake Clean Water and Ecosystem Restoration Act (HR 3852)nChesapeake Bay Program Reauthorization and Improvement Act (HR 5509)" } ] }
  • 13.
    © 2015 IBMCorporation13 Database structure in DashDB LOBBY-SEARCH • _ID • _REV • AMOUNT • CLIENT_CLIENTCOUNTRY • CLIENT_CLIENTID • CLIENT_CLIENTNAME • CLIENT_CLIENTPPBCOUNTRY • CLIENT_CLIENTPPDSTATE . . . . . . LOBBY-SEARCH_AFFILIATEDORGS • ARRAY_INDEX • AFFILIATEDORGCOUNTRY • AFFILIATEDORGNAME • AFFILIATEDORGPPBCCOUNTRY • _ID LOBBY-SEARCH_FOREIGNENTITIES • ARRAY_INDEX • FOREIGNENTITYCONTRIBUTION • FOREIGNENTITYCOUNTRY • FOREIGNENTITYNAME • FOREIGNENTITYOWNERSHIPPERCENTAGE • FOREIGNENTITYPPBCOUNTRY • FOREIGNENTITYSTATUS • _ID LOBBY-SEARCH_ISSUES • ARRAY_INDEX • CODE • SPECIFICISSUE • _ID LOBBY-SEARCH_OVERFLOW • EXCEPTION • WARNING • _ID LOBBY-SEARCH_GOVERNMENTENTITIES • ARRAY_INDEX • GOVENTITYNAME • _ID LOBBY-SEARCH_LOBBYISTS • ARRAY_INDEX • LOBBYISTINDICATOR • LOBBYISTNAME • LOBBYISTSTATUS • OFFICIALPOSITION • _ID LOBBY-SEARCH_GOVERNMENTENTITIES • ARRAY_INDEX • GOVENTITYNAME • _ID Keys with array-values are stored in separate tables. "Lobbyists": [ { "LobbyistName": "BORSKI, ROBERT A JR", "LobbyisteIndicator": "0", "LobbyistStatus": "0", "OfficialPosition": "Member of Congress (1983-2003)" }, { "LobbyistName": "TRUMBORE, JOHN MARK", "LobbyisteIndicator": "0", "LobbyistStatus": "0", "OfficialPosition": "Leg Asst (Rep. McHale), Leg Dir (Rep. R. Brady)" } ], "Year": "2010",
  • 14.
    © 2015 IBMCorporation14 Watson Analytics connection into DashDB
  • 15.
    © 2015 IBMCorporation15 Datasets in Watson Analytics
  • 16.
    © 2015 IBMCorporation16 Analysis in Watson Analytics
  • 17.
    © 2015 IBMCorporation17 Analysis in Watson Analytics
  • 18.
    © 2015 IBMCorporation18 Information delivery from Watson Analytics
  • 19.
    © 2015 IBMCorporation19 Q&A and more info Cloudant – http://www.cloudant.com DashDB – http://www.dashdb.com Watson Analytics – http://www.watsonanalytics.com Bluemix – http://www.bluemix.net Me! – http://ibm.biz/ton_machielsen
  • 20.
    © 2015 IBMCorporation20

Editor's Notes

  • #3 2
  • #4 3
  • #5 We have arrived at a tipping point where the abundance of data, emergence of cloud, advances in analytics, new user experience design and business models mean data-driven decisions can now be an essential, daily and valuable activity for business people. No longer just for data scientists or IT -- marketing, sales, operations, finance and HR professionals can gain answers they need from all types of data.  This requires a revolution in analytics technology, helping people acquire, refine data, discover insights, predict outcomes, visualize results, create reports, and collaborate with others in a unified user experience that speaks the language of business. Learn more about IBM Watson Analytics at WatsonAnalytics.com. Visit the site to get started with Watson Analytics for free. Register to access Watson Analytics on the cloud. Watch demos and how to videos, read content and talk to our experts through our community forum. Four key takeaway points: Watson Analytics brings together a complete set of self-service analytics capabilities on the cloud. You bring your problem, and Watson Analytics helps you acquire the data, cleanse it, discover insights, predict outcomes, visualize results, create reports or dashboards, and collaborate with others. Just bring your data, and Watson Analytics will do the rest. By automating all the steps of data access and refinement, predictive analytics, and visual storytelling, Watson Analytics jumpstarts your analysis and accelerates your time to value. It immediately starts you off with a visual story that illustrates what you need to know. Instead of fumbling over data or searching for answers, you can focus on understanding your business and effectively communicating results to stakeholders. Watson Analytics speaks the language of your business. Simply type in what you would like to see and Watson Analytics produces comprehensive results that explain why things happened and what's likely to happen, all in the familiar terms of your business. And as you interact with the results, you can continuously fine-tune your questions to get to the heart of the matter. Watson Analytics features the use of predictive analytics to surface the most relevant facts and uncover unforeseen patterns and relationships. This sparks the right questions to ask and directs your attention to the parts of their business that matter most.
  • #12 I want to pause briefly to discuss what we call the Schema Discovery Process (SDP). dashDB is built on a relational database, where the data is stored in structured relational tables. Cloudant stores JSON documents, where all the data is encapsulated in a single record. To move data between these two systems, we need to be able to translate our JSON docs into a schema (or set of tables) that dashDB understands. This is exactly what the SDP does. It scans your Cloudant database and intuits the implicit structures in your data. It then creates that proper schema in dashDB and copies the data over. While not a perfect solution, it performs exceptionally well with relatively simple and homogenous Cloudant databases. The SDP can help you discover how your data is organized, and that can power a whole suite of functionality within Cloudant.
  • #21 20