


CLASSWeb           Enterprise
    •        –    •    •    •

    Online Display   User is exposed to     Eventually, the user     Ad shown to     multiple advertising       performs ...
    Online Display   User is exposed to     Eventually, the user     Ad shown to     multiple advertising       performs ...
Commercial actionsNumber of impressions          Campaign 1Number of impressions          Campaign 2                      ...
•••••••••

Our new results                        Campaign 1                    Campaign 2                Low      Mean        High  ...
Distribution of R2 for all campaigns for 2000 campaigns from 1200 products - Very fast, inexpensive estimate          Prob...
Now working with leading AOL customers to help enhance advertising andanswering and exploring questions such as:• Online a...
Service             Service Request      Text Mining                                     Applications                     ...
Compare the time spent by engineers in readingservice requests before and after using our system.         Browse a service...

Topic 1     Topic 2       Topic 4      Topic 5    Topic 6    Topic 8    Topic 12     Topic 14    Topic 18    year      ham...


    •    •    •    •    •    •    •    •    •    •
    •    •    •



 Given a set of documents, we want to identify the main areas or topics   discussed in a unsupervised manner. We take adv...

Topic 1     Topic 2       Topic 4      Topic 5    Topic 6    Topic 8    Topic 12     Topic 14    Topic 18    year      ham...
Our                                                                                      Method  Retrieval Schemes        ...
Service                                                          Type                Feature                       Class a...
-    Internetworking Terms and Acronyms Dictionary (ITAD)-    Benefits: (1) the expansion of acronyms and terminology;    ...
Online display advertising is an area of rapid growth and consequently of great interestas a marketing channel.           ...
How do I allocate marketing budget for maximum ROI?- How effective is my marketing campaign?- What is the impact of any ch...


    •    •    •    •    •    •    •    •    •    •
    •    •    •
    Analytics +    Applications
CLASSWeb           Enterprise
Online Display   User is exposed to     Eventually, the Ad shown to     multiple advertising   user performs    a user    ...
Residual Total       Variance remaining   Variance Attribution:Variance remaining   from advertising     proportion offrom...


Specific & Users: Informational   High Probability of   &          &                        positive feedback           Ge...
Unstructured      IC                ProbleText              m                Cause                Solutio                 ...
    3 related SRs               IC                    1-2 hours       80 pages
                Our approach    Typically
User expects to find more  Depression                     relevant results each time she treatment of                    i...
Tim Dombrowski            Partner            05.16.12This information is confidential and was prepared by Andreessen Horow...
A16Z Overview                              Background                                                                     ...
One Team                                             6 General                                                            ...
Portfolio Overview                                         Consumer                                                       ...
Enterprise Portfolio                                  Infrastructure Software                                             ...

SnapLogic Overview
Agenda• Company Overview• snapLogic: Data Sharing Platform• Customer Use Cases
Strong Team, Strong Backing  Gaurav Dhillon – Founder, CEO  Co-Founder, former CEO, Informatica  Scott Edgington – SVP, Gl...
Selected Customers
SnapLogic: Data Sharing     ESB                RDBMS        ConsumerPre-2000: Legacy Data           2007: Consumer Cloud  ...
ChallengesThe Problem• Technology - TCV mismatch with legacy architectures     - Data Types     - Complexity     - Velocit...
Data Sharing Platform      Connect                                                       Scale       Any App              ...
Connect• Connect to data wherever it lives  - Modern, web oriented architecture connects everything  - Applications, datab...
Translate & Enrich• Translate with Snaps - standardized interfaces   - Between data formats & protocols, modern & legacy  ...
120 Snaps and counting  Easily Extensible               Distribute• Build or Buy              • SDK, APIs, IDEs• snapLogic...
Deliver• Share results on-time and in real-time• Batch/Schedule & Streaming• Event based triggers (via URLs)• Simultaneous...
ScaleVertical Scalability                      On-Demand • Processing Optimizations               To Match Load • Multi-th...
App Connectivity Across Domains• Application Connectivity  Everywhere• Deploy Anywhere                                 Pub...
Design: 100% Web-based80
Every Component Has A URL            https://demo.snaplogic.co            https://demo.snaplogic.co   https://demo.snaplog...
REST Based Location Independence82
REST Based Location Independence83
Design: HTML5 iPad App
Customer Deployments86
Selected Customer Examples   HR Workflow Orchestration             Big Data   Process Orchestration       Social, Enterpri...
Powering Outback’s HR System                                                               Exchange                       ...
Outback Steakhouse Plans                                           Real-time promotions        POS Data                   ...
Big Data Reference Architecture      1                       2                              3 Connect            Translate...
Social Meets Enterprise91
Gamer Relationship Management                                        •   User information                                 ...
Major Electronics Retailer exampleBRICK AND MORTAR STORES                                                              Sto...
Pandora– Nothing but Cloud
SnapLogic: Data Sharing     ESB                RDBMS        ConsumerPre-2000: Legacy Data           2007: Consumer Cloud  ...
Data Sharing Platform      Connect                                                       Scale       Any App              ...
Thank You!98

Microbial cells outnumber human cells        Average adult human body : ~ 1013 cells        Microbial content in human : ~...
Questions addressed• What are the different microbes present in a given environmental habitat/niche ?• What are their rela...
The real challenge for NGS and Metagenomics lies not in data acquisition but               in performing a meaningful anal...
Expected                                             Taxonomy                                                             ...
Gut Microbial samples from healthy children                Gut Microbial sample from malnourished children          Extrac...
Courtesy: Dennis Freeman
The unexpected application of cheap sequencing• Despite the obvious possibilities of sequencing many  new genomes, high th...
Example: SHAPE-Seq                                       Multiplexing           +               -                         ...
Example: S. aureus plasmid pT181 sense RNA                                                                     Initial rat...
    •    •
                                          

What is our security R&D goal?          • Right Information to Right User for Right Purpose          Why?          • Achie...
What data to protect from whom?              • Template based watermarking                            • Logic formalizatio...
Challenges faced• Environment heterogeneity• Requirements ambiguity• Evolving compliance landscape• Proofs of solution tru...
 The GRC Universe Global Macro Analysis and view on future Adopting GRC – TCS POV     Organizational challenges     I...
Dodd-Frank             Basel                                                        Credit Risk                  AML Act  ...
• Japan- Risks of Natural Disasters, High Debt, Volatile Currency, High Dependence on exports                             ...
•••••••••••••••••••    Conceptual inputs source: Deloitte Research
•••    •   ••    •   •••   •   ••    •   ••    •   •••   •   •••••
Operational                                                   Market Risk                                                 ...
••••    •
―We must distill down vast amounts of data intosecurity intelligence — prioritized, actionableinsight. To prioritize actio...
Model, Simulate, Act           Community                                                    ContextPatterns, meaningful   ...
Ap p l i                                     c a t i o Bu s i                                 n s                     Pr o...

    Study Released 16 May
Integrated Governance, Risk & Compliance (GRC) and QualityVision      Management for Better Business Performance          ...
Metrics                        Simulation/AnalyticsSolutions                                  Regulatory           Operati...
Area of            Functions/Compliance           Standards           Processes          Risks              Controls    Co...
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA
Upcoming SlideShare
Loading in …5
×

TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA

1,800 views

Published on

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,800
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Engagement modelGrounded in reality, not ivory tower
  • Complexity with thousands of campaigns being turned off and on
  • Lead-lag effect below
  • Need to expand future directions and discuss topics where Aol customers can engage
  • Engagement modelGrounded in reality, not ivory tower
  • Key off hot button topic
  • Key off hot button topic
  • Engagement modelGrounded in reality, not ivory tower
  • - Query keywords may have different meaning for different users
  • At the risk of sounding too militaristic, building a successful new company is like preparing for the battle of your professional life. Just as you would if you were preparing to enter a real military battle, you want to surround yourself with the best team, you want to gather and assess all of the market intelligence, you want to soften-up the underbelly of the enemy with a pre-emptive air strike and then you want the full force of the ground troops to help you execute your plan.When we set out to build Andreessen Horowitz, we thought – “Could a venture capital firm actually be part of the arsenal that the founding team could leverage on the battle field?”We realized to do so would require a wholesale re-construction of the traditional venture capital business to create a modern venture capital firm capable of helping accelerate the founding team’s game plan.We’d like to spend the next hour taking you through our thinking on the key elements of a modern venture capital firm. I will briefly frame the discussion and then we’ve asked each of the heads of our operating groups to walk you through their areas of focus.Please feel free of course to jump in with questions at any time.
  • Most significantly, we have designed the firm from the ground-up to function as a single operating entity. We have essentially broken the single-threaded relationship between GP and portfolio company so that we can deploy the resources of the entire firm in furtherance of our portfolio companies.What does this mean for you?You have the strength of 20 operating partners to complement your relationship with whichever GP is on your board. The implication of that is that the whole set of relationships that the firm has are available to you; you are not limited to only those relationships & time that your GP hasYou will get more GP time period – since we are providing them significant leverage by dis-aggregating their job into the functional unitsWe have staffed our operations team with functional experts in their fields – so, you will get best-of-breed support in all areas, as opposed to simply relying on a single GP to be a jack of all trades
  • Other terms instead of resource are: Endpoint, nodes, data source/target, SaaS applications.We prefer the term “resource” because it aligns with the REST terminology where any touch point that can be identified via a URI is a resource.
  • Not just building another integration company?DNA of the team
  • Across industries and verticalsModern and old schoolDiscuss more about what many are doing
  • 1990sBorn from science and computationEntered enterpriseESBs, flatfiles, dbs, etcStructured dataRelationalBatchGBs/TBsImmense value in legacy information and historical data2000sNetwork speeds increaseCosts go downSMBs are early adoptersGBs/TBsImmense value on cost and agility2005: ConsumerFB, Twitter, etcHumans generating massive amouts of preference dataSocial indicators and signalsNon-relational unstructured, real-time dataPetabytesImmense value to the business on their customers2010: MachineSnesors, weblogs, etcMassive amounts of dataExabytesSnap Logic4 sources create an Impendence mismatchGood luck doing all of this with an ESB Structured vs. unstructuredStreaming vs. batchGBs vsExabytesPull vs. pushHub and spokeUnprecedented opportunity & desire to use dataData silos (data fragmentation) unavoidableLegacy Apps, Cloud Apps, and Hadoop are driving thisDifferent locations, protocols, formats, and architecturesData is more distributed & less accessible (less useful)Compounding due to volume & variety of apps & dataESB is just another connectionEnterprises must share data between their appsCollect, combine, process data into valuable informationCompetitive advantage will become necessity for survivalsnapLogic = data sharing platformCompanies that will will do all of these are the ones that will succeed----- Meeting Notes (3/12/12 13:44) -----
  • A Data Sharing Platform must: Connect, Translate, Enrich, DeliverPlatform Characteristics = Scale, Extend, Design, DeployConnect securely to any application @ any locationTranslate between data formats & protocolsEnrich data;Combine relational, structured, unstructured data typesDeliver results everywhere needed, on timeQuickly scale up & down to match load / demandOpen & extensible architecture: REST, APIs, and SDKSimple, visual interface: enable a broad set of usersThis platform provides value:Agility & flexibility - Make changes quickly & easilyEnable new business capabilities - Combine legacy & modern applications & dataEnrich apps to increase their valueAvoid vendor lock-in:Loosely couple business process with applicationsSwap out apps with minimal disruptionOperational efficiencyKeep developers focused on combining data, not figuring out how to access itEmpower a broader set of users with simpler tools
  • Modern, web oriented approach (ReSTful to the core)Everything in snapLogic is a URLAccessible by any web client
  • Modern, web oriented approach (ReSTful to the core)Everything in snapLogic is a URLAccessible by any web client
  • Easy to buildA kid out of school built snaps in 4 daysContaineraztionGo viral in theVmware environment
  • Being 100% RESTful, snapLogic provides the scalability and resilience similar to that of web applicationsHorizontal Scalability is offered via multiple Worker nodes. The Head node receives all the request and assigns Pipelines to Worker nodes. More Worker nodes can be added when expecting high data traffic.Vertical Scalability is provided by maximizing performance on a single instance. Of significance are:In-memory operations that perform data transformations in memory. Where possible, the data is not transported from Component to Component while processing, rather ‘zero data copies’ are made as Component work off of the same data.Resources that offer bulk data handling are leveraged to the full extent. For example, Salesforce’s Bulk API, and Oracle’s Bulk LoaderHigh Availability (or Failover) is offered by:Active/passive configuration of the Head node. The passive node maintains a mirror of the active node and is ready to take over in case there’s a failure at the active nodeIf a Worker node goes down, the active node moves the job assignments to an active Worker node
  • From Cloud to On-PremCertificate Based SSLNo firewall rules or configConnections from inside out
  • The main pieces of the snapLogic Platform are:snapLogic Server: This is the command and control center. It is responsible for authorization, security, administration, pipeline definitions, log management, etc.Component Container: This is the execution engine. All data transformations are carried out by the Component Container. It is independent of the Server in the sense that it can be deployed separately from the Server. But it still relies of the Server for commands. The communication is 100% REST based. Note that multiple Component Containers can be managed by a single Server. This is an extremely powerful concept that allows true Cloud style implementation with distributed architecture.Management Console: This is the primary UI for the system administrator. Authorization, management, and monitoring tasks are controlled by this console. There are actually 2 pieces to it (not shown in the diagram). First is a browser based UI, and second is an iPad UI (HTML 5).Designer: This a visual tool for building and testing Pipelines. It is a browser based thin client application. In provides pre-packaged Components as well as ability to add new Snaps (Components).Here is brief description of all the boxes in the diagram: snapLogic Server – The ‘control plane’SnAPI – REST API interface to the snapLogic PlatformData Security – Security of data-in-motionAuthentication & Authorization – Accounts and privileges. (Supports LDAP authentication)Administration – Admin functions and server maintenancePipeline Manager – Definition, execution, and monitoring of integration pipelinesScheduler & Notification – Management of Pipeline executions, and status alertsRuntime Statistics – Detailed tracking of data passing through PipelinesLog Manager – Access, exception, and trace log filesRepository Manager – Meta-data management of PipelinesComponent Container – The ‘data plane’. Executes Pipelines and applies data transformationsSidekick – Special Component Container for on-premise resources Designer – Browser based visual tool for creating PipelinesManagement Console – Browser based console for management and monitoring of PipelinesSnap – Collection of specialized Components related to a specific resources e.g. SalesforceSnapStore –snapLogic’s online marketplace for SnapsSnap SDK – Java and Python based SDK for private Snap development
  • Stats2.5MM SKUs with over 40MM attributes20 back end systems30+ back end changes/monthWebtrafficincreaseing at 1MM uniques/monthActive spacesTried and failed at MDM; a single source cannot control something like thisThe fridge gets passed along at the pace of this business with 20MM SKUsLatency in information reaching the storeCan’t change pricing dynamically in real-timeETL and EAI cannot do this due to latencyModern arch is about streaming and caching
  • Need to work with a Designer to clean up this slideand research if any of these apps are on-PREM30 SaaS applicationsQuick implementationsLower TCO
  • 1990sBorn from science and computationEntered enterpriseESBs, flatfiles, dbs, etcStructured dataRelationalBatchGBs/TBsImmense value in legacy information and historical data2000sNetwork speeds increaseCosts go downSMBs are early adoptersGBs/TBsImmense value on cost and agility2005: ConsumerFB, Twitter, etcHumans generating massive amouts of preference dataSocial indicators and signalsNon-relational unstructured, real-time dataPetabytesImmense value to the business on their customers2010: MachineSnesors, weblogs, etcMassive amounts of dataExabytesSnap Logic4 sources create an Impendence mismatchGood luck doing all of this with an ESB Structured vs. unstructuredStreaming vs. batchGBs vsExabytesPull vs. pushHub and spokeUnprecedented opportunity & desire to use dataData silos (data fragmentation) unavoidableLegacy Apps, Cloud Apps, and Hadoop are driving thisDifferent locations, protocols, formats, and architecturesData is more distributed & less accessible (less useful)Compounding due to volume & variety of apps & dataESB is just another connectionEnterprises must share data between their appsCollect, combine, process data into valuable informationCompetitive advantage will become necessity for survivalsnapLogic = data sharing platformCompanies that will will do all of these are the ones that will succeed----- Meeting Notes (3/12/12 13:44) -----
  • A Data Sharing Platform must: Connect, Translate, Enrich, DeliverPlatform Characteristics = Scale, Extend, Design, DeployConnect securely to any application @ any locationTranslate between data formats & protocolsEnrich data;Combine relational, structured, unstructured data typesDeliver results everywhere needed, on timeQuickly scale up & down to match load / demandOpen & extensible architecture: REST, APIs, and SDKSimple, visual interface: enable a broad set of usersThis platform provides value:Agility & flexibility - Make changes quickly & easilyEnable new business capabilities - Combine legacy & modern applications & dataEnrich apps to increase their valueAvoid vendor lock-in:Loosely couple business process with applicationsSwap out apps with minimal disruptionOperational efficiencyKeep developers focused on combining data, not figuring out how to access itEmpower a broader set of users with simpler tools
  • TCS Innovation Forum 2012 - Day2: May 15 and 16, Le Meridien Cambridge, MA

    1. 1. 
    2. 2. 
    3. 3. 
    4. 4. CLASSWeb Enterprise
    5. 5.  • – • • •
    6. 6. 
    7. 7.  Online Display User is exposed to Eventually, the user Ad shown to multiple advertising performs a user channels in time commercial actions
    8. 8.  Online Display User is exposed to Eventually, the user Ad shown to multiple advertising performs a user channels in time commercial actions
    9. 9. Commercial actionsNumber of impressions Campaign 1Number of impressions Campaign 2 14
    10. 10. •••••••••
    11. 11. 
    12. 12. Our new results Campaign 1 Campaign 2 Low Mean High Low Mean High AB Testing 0.009 0.199 0.458 -0.034 0.15 0.312 Our 0.044 0.068 0.119 0.094 0.18 0.519 Attribution ModelNo cookies/user trackingIncorporates inferred- Time series AB testing has- Lags and decay of impact high variability- Saturation due to sparsity- Multiple campaigns
    13. 13. Distribution of R2 for all campaigns for 2000 campaigns from 1200 products - Very fast, inexpensive estimate Probability distribution Campaign Impact 18
    14. 14. Now working with leading AOL customers to help enhance advertising andanswering and exploring questions such as:• Online advertising and associated attribution• Optimizing campaigns mid-flight• Helping tune your A/B testing over time e.g. Is 50% non-exposed over 1 weekbetter or 17% unexposed over 3 weeks?• Optimization framework to achieve statistically significant attribution and minimizethe cost of A/B testing over time.• Multi-touch attribution 19
    15. 15. Service Service Request Text Mining Applications Knowledge such asRequest Database System Database retrieval Unstructured Text Knowledge Problem Finding different solutions to the same problem Cause Document 1 Similarity Document 2 Solution high Problem Problem high Cause Cause Irrelevant low Content Solution Solution What was the Why did it How was it problem? occur? solved?
    16. 16. Compare the time spent by engineers in readingservice requests before and after using our system. Browse a service request Time to access N Relevant? relevance Y Read and understand Time to thoroughly extract knowledge N Read enough? Y Time to access Time to extract relevance knowledge Create knowledge article Before using system 27 minutes 97 minutes After using system 11 minutes 67 minutes Productivity improved by 145% 45%
    17. 17.
    18. 18. Topic 1 Topic 2 Topic 4 Topic 5 Topic 6 Topic 8 Topic 12 Topic 14 Topic 18 year hamza forc govern muslim british iraq ira iran servic al afghanistan state protest court british ireland nuclear bank abu troop presid cartoon law soldier northern council busi charg oper countri group govern kill sinn securcompani muslim defenc africa polic tortur iraqi british iranian uk mosqu militari elect peopl case militari mr russia industri terror british nation mr terror bomb fein meet market murder 000 african murder suspect troop polic tehran system islam year intern attack evid basra govern foreign million cleric afghan nigeria islam right war irish iaea financi mr nato s london uk forc donaldson enrich cost hatr secretari unit newspap alleg al belfast intern monei told secur polit violenc rule armi republican diplomat fund london reid parti demonstr britain royal year russian billion trial countri independ year prison di parti weapon increas year mission 2005 danish mr baghdad spy state £ kill britain 2004 call legal attack polit uranium global masri armi travel public human death member britain provid prosecut command south arrest charg 2003 charg atom
    19. 19. 
    20. 20.
    21. 21.  • • • • • • • • • •
    22. 22.  • • •
    23. 23. 
    24. 24. 
    25. 25.
    26. 26.  Given a set of documents, we want to identify the main areas or topics discussed in a unsupervised manner. We take advantage of the semantic associations between words across the documents. If two words appear in the same document, they should be related. Music notes instrumen Play t ball net racquet Sports For each topic we have different distributions of words and each document might contain material about a variety of topics. Topic 1 (80%) Sports Topic 2 (5%) Topic 1 Sports Topic 3 (20%) Common Words
    27. 27.
    28. 28. Topic 1 Topic 2 Topic 4 Topic 5 Topic 6 Topic 8 Topic 12 Topic 14 Topic 18 year hamza forc govern muslim british iraq ira iran servic al afghanistan state protest court british ireland nuclear bank abu troop presid cartoon law soldier northern council busi charg oper countri group govern kill sinn securcompani muslim defenc africa polic tortur iraqi british iranian uk mosqu militari elect peopl case militari mr russia industri terror british nation mr terror bomb fein meet market murder 000 african murder suspect troop polic tehran system islam year intern attack evid basra govern foreign million cleric afghan nigeria islam right war irish iaea financi mr nato s london uk forc donaldson enrich cost hatr secretari unit newspap alleg al belfast intern monei told secur polit violenc rule armi republican diplomat fund london reid parti demonstr britain royal year russian billion trial countri independ year prison di parti weapon increas year mission 2005 danish mr baghdad spy state £ kill britain 2004 call legal attack polit uranium global masri armi travel public human death member britain provid prosecut command south arrest charg 2003 charg atom
    29. 29. Our Method Retrieval Schemes Baseline Our MethodRetrieval Deterministic Probabilisticmodels model modelInformation The whole The semantically document labeled paragraphsDomain None DictionaryKnowledge  labeled paragraphs  Using domain knowledge further improves retrieval results.  Result 3: Probabilistic recommender outperformed deterministic recommender.
    30. 30. Service Type Feature Class and Request Motivation Length of paragraph Short paragraphs are usually irrelevant. Relative position of a Service requests have StatisticPreprocessor paragraph in a service request the hidden process al “problem → cause→ features Bag-of-words solution”. Number of “%” Error codes (relevant) begin with “%”.Hierarchical Feature Expertise Contain “Hi”, “Hello”, “my Introduction, Classifier Generator name”, or “I’m” irrelevant Context Contain “feel free”, “to Salutation, irrelevant Domain ual contact”, or “have a ... day”; Knowledge features begin with “Best” or “Thank” Telephone number, zip code, Contact information, Labeled Service Request or affiliation irrelevantParagraphs Recommender Contain “problem”, “error Problem message” or “symptom” Contain “suspect”, “seem”, Troubleshooting “looks like”, “indicate”, “try”, Hint User “test”, or “check” words Contain “recommend”, Solution “suggest”, “replace”, Legend “reseat”, “RMA”, or “workaround” Number of words from Usually relevant Data flow Data flow of Data output for Lexical domain dictionary of Analyzer Recommender User features Product name Usually relevant
    31. 31. - Internetworking Terms and Acronyms Dictionary (ITAD)- Benefits: (1) the expansion of acronyms and terminology; (2) the enhancement of concept dependencies.- Example: The phone boots up and it does a DHCP [Dynamic Host Configuration Protocol. Provides a mechanism for allocating IP addresses Snippet from Doc1 dynamically so that addresses can be reused when hosts no longer need them] request in the native VLAN [virtual LAN]. There it gets an IP address [32-bit address assigned to hosts using TCP/IP] and an option that it needs to boot up in the VLAN 40 and that it need to go in Measuring trunking [physical and logical connection between two switches similarity across which network traffic travels] mode. Host Server with 2 interfaces [connection between two systems or devices] and one default gateway. When ping Vlan-B [virtual LAN] interface an ARP [Address Resolution Protocol. Internet protocol Snippet from Doc2 used to map an IP address to a MAC address] request with a source IP of Vlan-B is sent to Default Router [network layer device that uses one or more metrics to determine the optimal path along which network traffic should be forwarded. Routers forward packets from one network to another based on network layer information] on Vlan-A, but Router does not respond to ARP request. […]: explanation from ITAD. Blue: overlapping words between unexpanded excerpts. Red: overlapping words introduced by ITAD.
    32. 32. Online display advertising is an area of rapid growth and consequently of great interestas a marketing channel. 40
    33. 33. How do I allocate marketing budget for maximum ROI?- How effective is my marketing campaign?- What is the impact of any channel on sales? 41
    34. 34. 
    35. 35.
    36. 36.  • • • • • • • • • •
    37. 37.  • • •
    38. 38.  Analytics + Applications
    39. 39. CLASSWeb Enterprise
    40. 40. Online Display User is exposed to Eventually, the Ad shown to multiple advertising user performs a user channels in time commercial actions
    41. 41. Residual Total Variance remaining Variance Attribution:Variance remaining from advertising proportion offrom time series campaigns (based variance describeddependencies on ad impressions) by campaigns
    42. 42. 
    43. 43.
    44. 44. Specific & Users: Informational High Probability of & & positive feedback GenericQueries: Initial Final Query Query
    45. 45. Unstructured IC ProbleText m Cause Solutio n Irrelevant Content
    46. 46.  3 related SRs IC 1-2 hours 80 pages
    47. 47.  Our approach Typically
    48. 48. User expects to find more Depression relevant results each time she treatment of interacts with the system patients… q3: symptoms and treatment q2: depression symptoms q1: elderly depression DOCTOR SEARCH Depression influence on familyrelationships… Relevance of the presented SOCIAL documents depends on user context 10/6/2011 SCIENTIST
    49. 49. Tim Dombrowski Partner 05.16.12This information is confidential and was prepared by Andreessen Horowitz (―The Firm‖) for exclusive use with its partners. It is not to be referenced, published, or presented without ―The Firm’s‖ prior written consent.
    50. 50. A16Z Overview Background Investment Focus Funding ➔ Founded 2009 ➔ Technology ➔ Seed ➔ $2.7B assets under ➔ Best in sector ➔ Venture management ➔ U.S. based ➔ Growth ➔ 50+ employees ➔ Silicon Valley ➔ Pre-IPO franchisesThis information is confidential and was prepared by Andreessen Horowitz (―The Firm‖) for exclusive use with its partners. It is not to be referenced, published, or presented without ―The Firm’s‖ prior written consent.
    51. 51. One Team 6 General 30 Partners Partners Marc Andreessen Ben Horowitz Jeff Jordan Peter Levine Market Deal Network Executive Technical Marketing & Portfolio John O’Farrell Development & Research Talent Talent Positioning Management Scott WeissThis information is confidential and was prepared by Andreessen Horowitz (―The Firm‖) for exclusive use with its partners. It is not to be referenced, published, or presented without ―The Firm’s‖ prior written consent.
    52. 52. Portfolio Overview Consumer Enterprise Commerce Media Enterprise Security Applications Social Mobile Infrastructure Big Data Software Gaming Marketplaces Storage Mobile Electronics Payments Networking HardwareThis information is confidential and was prepared by Andreessen Horowitz (―The Firm‖) for exclusive use with its partners. It is not to be referenced, published, or presented without ―The Firm’s‖ prior written consent.
    53. 53. Enterprise Portfolio Infrastructure Software Enterprise Applications Security BigData Mobile Payment Networking Hardware StorageThis information is confidential and was prepared by Andreessen Horowitz (―The Firm‖) for exclusive use with its partners. It is not to be referenced, published, or presented without ―The Firm’s‖ prior written consent.
    54. 54. 
    55. 55. SnapLogic Overview
    56. 56. Agenda• Company Overview• snapLogic: Data Sharing Platform• Customer Use Cases
    57. 57. Strong Team, Strong Backing Gaurav Dhillon – Founder, CEO Co-Founder, former CEO, Informatica Scott Edgington – SVP, Global Field Ops Troux, Voltage, BEA, PTC John Schuster– VP, Engineering Cisco, IronPort, NetApp Chris Wagner – Chief Architect Cisco, IronPort, NetApp, sgi, Convergent, Bell Labs Ediz Ertekin – VP, Global Services & EMEA Verix, Informatica, Sybase Ash Jhaveri – VP, Product Management Google, Microsoft, MicroStrategy Lisa D’Alencon– Chief Financial Officer Bridgeway, Bitfone, cc:Mail/Lotus, PwC Strong Support
    58. 58. Selected Customers
    59. 59. SnapLogic: Data Sharing ESB RDBMS ConsumerPre-2000: Legacy Data 2007: Consumer Cloud Enterprise 2000: Enterprise Cloud 2012: Big Data
    60. 60. ChallengesThe Problem• Technology - TCV mismatch with legacy architectures - Data Types - Complexity - Velocity & Volume• Business Model - Cost per end point of legacy ETL and EAI products - Not in line with SaaS pricing models - Volume of applications requiring integrationThe Result• Regression back to custom point to point integrations71
    61. 61. Data Sharing Platform Connect Scale Any App Up & Down Quickly Anywhere Design Translate Simply & Visually All Data Broad User BaseStructured & Unstructured Any Protocol Extend Enrich DATA SHARING Open SDK & APIs Add, Combine Loose Coupling (REST) PLATFORM Cleanse, Process Deliver Deploy Cloud & On-Premise On-time & Real-time Public & Private Batch & Streaming Enable Enterprise Introspection
    62. 62. Connect• Connect to data wherever it lives - Modern, web oriented architecture connects everything - Applications, databases, and filesystems @ any location - Virtual, physical, cloud, on-prem, public, private Snap Loose Coupling Easy, Flexible Changes
    63. 63. Translate & Enrich• Translate with Snaps - standardized interfaces - Between data formats & protocols, modern & legacy - Snap in once, share data with all: Apps, DBs, ESBs, …• Enrich using built-in tools and applications - Operators (e.g. join), MDM, Data Cleansing… - Functional, stateless approach 10110 0x8F5E1 11011 0x18978 0100 10110 11011 0100 1 0 1 0x2F5E1 0x34368 10110 11011 0x5F4E1 0100 0x28978
    64. 64. 120 Snaps and counting Easily Extensible Distribute• Build or Buy • SDK, APIs, IDEs• snapLogic & 3rd parties • Java, Python Leverage Expertise
    65. 65. Deliver• Share results on-time and in real-time• Batch/Schedule & Streaming• Event based triggers (via URLs)• Simultaneously to multiple destinations• Create different views of the same data - For different lines of business or groups
    66. 66. ScaleVertical Scalability On-Demand • Processing Optimizations To Match Load • Multi-threaded • Zero local data copies • Bulk operationsHorizontal Scalability • Scale up or down quickly • Static configuration: Cluster - HA/Failover • Dynamic: Behind Load Balancer - Deploy images on demand - “Infinitely” scalable
    67. 67. App Connectivity Across Domains• Application Connectivity Everywhere• Deploy Anywhere Public Cloud On-Prem & Hosted in Cloud B C Private Cloud A On-Prem sL D
    68. 68. Design: 100% Web-based80
    69. 69. Every Component Has A URL https://demo.snaplogic.co https://demo.snaplogic.co https://demo.snaplogic.co https://demo.snaplogic.co https://demo.snaplogic.co https://demo.snaplogic.co https://demo.snaplogic.co81
    70. 70. REST Based Location Independence82
    71. 71. REST Based Location Independence83
    72. 72. Design: HTML5 iPad App
    73. 73. Customer Deployments86
    74. 74. Selected Customer Examples HR Workflow Orchestration Big Data Process Orchestration Social, Enterprise, Cloud, Big Data
    75. 75. Powering Outback’s HR System Exchange Pipeline AD Look at Pipeline event type Call LMS appropriate Pipeline destinationUltipro Queue the pipelines Travel event, save Leader details to DB , Set destination Pipeline flags Expense wire Pipeline Comp Card Pipeline Database, All Employees latest data, event Queue, Logs, Business rules
    76. 76. Outback Steakhouse Plans Real-time promotions POS Data engagement Location influence Location and fans Customer Inventory Database reviews On Premise at OSI
    77. 77. Big Data Reference Architecture 1 2 3 Connect Translate & Enrich Deliver StructuredRelational Data BI Hadoop Data Refinement Table Cloud & On-Prem Data DB Structured & (rows) Unstructured Data snapLogic - Hadoop View Unstructured Integration via Hive & HDFS Hierarchical Data (social)
    78. 78. Social Meets Enterprise91
    79. 79. Gamer Relationship Management • User information • Streams • Stats • Full FQL access • Achievements • Community• Custom snap (limited vendor API) • Custom snap (limited• Playback information vendor API) and analytics• Other videos viewed• Video comments Process 25M records• Enrich customer every 2 hours database with 1.5B records information from processed every 3 gaming networks and hours social media sites• User provisioning & Activision fraud detection Customer• Gamer assistance 92 DB Corporate Environment
    80. 80. Major Electronics Retailer exampleBRICK AND MORTAR STORES Store view Real time (real-time inFeatures streaming memory cache)Sticker to placeon physicalobject in store Real time On-line view streaming (real-time in memory cache) Nightly Video content batch Continuous data Accounting enrichment view (database) Event driven Master SKU Database Inventory view (real-time in Accessories to memory cache) co-sell ONLINE BUSINESS
    81. 81. Pandora– Nothing but Cloud
    82. 82. SnapLogic: Data Sharing ESB RDBMS ConsumerPre-2000: Legacy Data 2007: Consumer Cloud Enterprise 2000: Enterprise Cloud 2012: Big Data
    83. 83. Data Sharing Platform Connect Scale Any App Up & Down Quickly Anywhere Design Translate Simply & Visually All Data Broad User BaseStructured & Unstructured Any Protocol Extend Enrich DATA SHARING Open SDK & APIs Add, Combine Loose Coupling (REST) PLATFORM Cleanse, Process Deliver Deploy Cloud & On-Premise On-time & Real-time Public & Private Batch & Streaming Enable Enterprise Introspection
    84. 84. Thank You!98
    85. 85. 
    86. 86. Microbial cells outnumber human cells Average adult human body : ~ 1013 cells Microbial content in human : ~1014 cellsDigestion foodSynthesis of essential vitamins and amino acidsBreak down toxinsFight with disease causing microbesSource of antibioticsDisease diagnostics
    87. 87. Questions addressed• What are the different microbes present in a given environmental habitat/niche ?• What are their relative proportions ?• How do they function ?• What is the role of each individual microbe or group of microbes ?• How do they interact ?
    88. 88. The real challenge for NGS and Metagenomics lies not in data acquisition but in performing a meaningful analysis of the same Challenges Data storage - Metagenomics samples sequenced using NGS technologies generate millions of DNA sequences Data quality - Presence of low quality sequences in NGS data Data consistency - Length of sequences is short and varies with NGS platform (A) 454 (Roche): GS20 – 100 bp; FLX - 250 bp; Titanium – 400 bp (B) Illumina (Solexa): ~150 bp (C) ABI SOLiD: ~ 50 bp ) Gaps Few algorithms available for analyzing NGS data obtained from metagenomes. Lack of one-stop analysis platforms for analyzing metagenomics NGS data.
    89. 89. Expected Taxonomy OUTCOMESHEALTHCARE 16S based Drug targets,Data from healthy / Who is there? 1 i-rDNA Novel genesdiseased individuals What are their relative 2.C16S Biomarkers proportions?BIOPROSPECTING WGS based 1. Sort-ITEMSData from diverse Industrially 2. DiScRIBinATEenvironmental important 3. ProViDEniches microbes 4. INDUS Pre-processing Comparative 5. SPHINX 6. TWARIT MetagenomicsAGRICULTURE 1 Quality Control. Biocides,Data from 2. EuDetect TCS Algorithms 1 HabiSign fertilizers,agricultural soils, 3. GRID 2. Community Pest controlpest microbiomes 4. MetaCAA Functional Analyser measures ProfilingENVIRONMENT How do they 1 COGNIZER function?Data from oil spills, 2. Gene Role of each Novel Bio-landfills, industrial Prediction microbe & microbial remediation groups?drainage etc., How do they strategies interact?
    90. 90. Gut Microbial samples from healthy children Gut Microbial sample from malnourished children Extracted and sequenced genomic content using next generation sequencing technologies Identified microbial groups/genes/proteins and pathways specific to healthy and malnourished children Useful for devising probiotics & nutritional strategies using this information* Gupta SS et al., ―Metagenome of the gut of a malnourished child‖, Gut Pathogens, 3:7, (2011)
    91. 91. Courtesy: Dennis Freeman
    92. 92. The unexpected application of cheap sequencing• Despite the obvious possibilities of sequencing many new genomes, high throughput DNA sequencers have instead been mainly utilized as bean counters for ―sequence census‖ methods.• The majority of DNA sequence currently produced is for *-seq experiments: Desired reduce to Solve inverse Sequence measurement sequencing problem Creativity Biology Computer Mathematics/Statistics Science Analyze (Computational) Biology• Assays include: ChIP-Seq, RNA-Seq, methyl-Seq, GRO-Seq, Clip-Seq, BS-Seq, FRT-Seq, TraDI-Seq, Hi-C, SHAPE-Seq... 110
    93. 93. Example: SHAPE-Seq Multiplexing + - Experiment Sequencing Read Alignment Counting StatisticalInfer reactivities from measurements Inference
    94. 94. Example: S. aureus plasmid pT181 sense RNA Initial rate estimate 5’ 3’ • S. Aviran, C. Trapnell, J.B. Lucks, S.A. Mortimer, S. Luo, G.P. Schroth, J.A. Doudna, A.P. Arkin and L. Pachter,Modeling and automation of sequencing-based characterization of RNA structure, Proceedings of the National Academy of Sciences, (2011)
    95. 95.  • •
    96. 96.              
    97. 97. 
    98. 98. What is our security R&D goal? • Right Information to Right User for Right Purpose Why? • Achieves Minimum Information Disclosure in an enterprise, thus minimizes the attack surface. • The best protection even against insider attacks! • 31% of all data breaches attributed to malicious insiders (2010)* • Remember our associates are your insiders! • Incredibly important area of work for us. • Little applied and foundational work available when we started. Challenge • Identifying the right!*The Risk of Insider Fraud, U.S. Study of IT and Business Practitioners by Ponemon Institute ,Oct 2011.
    99. 99. What data to protect from whom? • Template based watermarking • Logic formalization of HIPAA in • TCS-CA: Indias largest issuer the de-factoPatentedA high utility, privacy data • Now • open source • lightweight dynamic How to protect? DRM• Privacypreservinggit’ generation publications: 15 • videos for digital Enterprise of digital certificates and utility preserving collaboration with Stanford access control layer for the data solution that is non- masking Research • Tailor the detection to a document masking • Rights tied static data University • Full life-cycle tool (2 of our • Largest users: Fedora, KDE; technologyintrusive and easy-to-deploy Patents applied: 6 mechanism *after* the attack happy customers • Automated HIPAA compliance • Many "ID", not the document traction 5 Awards: competitors use it too!)its correctness? • Many happy customers with BPO Guarantee of thousands of smaller ones• Strong monitoring has happened! eDRM 20122002 Watermarking Gitolite HIPAA Watch Right User Right Information Right Purpose RP RP RP RP RI RI RI RI RI RI RU RU RU RU RU RU RU RU
    100. 100. Challenges faced• Environment heterogeneity• Requirements ambiguity• Evolving compliance landscape• Proofs of solution trustworthinessConcepts developed• Policy codification• Policy externalization• Minimal intrusion• Platform based solution
    101. 101.  The GRC Universe Global Macro Analysis and view on future Adopting GRC – TCS POV  Organizational challenges  Integrated GRC Landscape TCS presence in GRC Key takeaways
    102. 102. Dodd-Frank Basel Credit Risk AML Act Spread Risk COSO Market Risk Counterparty Risk Fraud Risk Operational Risk Interest rate Risk Currency Risk Macro Risk IFRS Volatility Risk Risk to Physical AssetsSOX Concentration Risk Process Risk Correlation Risk Natural Calamities Liquidity Risk People Risk Supervisory Risk FCRA Political Risk Systemic Risk MiFiD Inflation Risk Model Risk Reputation Risk Sovereign Risk Legal Risk Contagion Risk Accounting Risk SEPA AZ/NZS Information Security Risk FCPA FATCA
    103. 103. • Japan- Risks of Natural Disasters, High Debt, Volatile Currency, High Dependence on exports • China- Undervalued currency, high dependence on exports, huge investments in Euro Zone & US Loss of AAA rating, Huge Debt , High Unemployment, Slow recovery • India - Political Risk, High Inflation, Current A/c deficit, Hostile Neighborhood, High dependence on Oil imports • Australia – Relatively stable economy, sensitive to commodity the verge of Default, Germany and Risk of Euro breaking up, Greece, Spain, Portugal , France- Ratings cut, Huge Debt, Onshocks, coupled to Europe and NA France - Economy slowing UK- High Debt, Second Recession * APAC US * Europe••• • • • • • • • Dodd Frank Solvency II MiiFiD IFRS ERM Integration of Risk and FATCA Basel III Mobility Finance
    104. 104. ••••••••••••••••••• Conceptual inputs source: Deloitte Research
    105. 105. ••• • •• • ••• • •• • •• • ••• • •••••
    106. 106. Operational Market Risk Basel/CRD FATCA MetricStream Risk Risk Compliance SAS Credit Risk Liquidity Risk DFA Solvency Consulting Current State Assessment, Target State Roadmap, Gap Analysis, Architecture Review, Product Selection Fermat Solution Implementation System Build, Configuration, Customization, Integration SunGardServices Data Management Data Sourcing, Validation & Transformation, Data Warehouse OFSAA Analytics Model Building, Model Validation, Model Recalibration, Model Management & Maintenance Assurance System Testing, UAT Support, Internal Parallel Run, External Parallel Run Solution Accelerators - TCS IPs Basel 2 and 3 LRM Implementation Risk Assurance KPI Market Risk ALMImplementation Framework Framework Framework Framework Framework Framework Enterprise Risk Architecture DFA Heath Checkup Framework
    107. 107. •••• •
    108. 108. ―We must distill down vast amounts of data intosecurity intelligence — prioritized, actionableinsight. To prioritize actions, there must belinkages to the business value of the assets andan improved understanding of the risk theyrepresent.‖ - Gartner Source: Information Security Is Becoming a Big Data Analytics Problem Published: 23 March 2012 Gartner research by Neil MacDonald
    109. 109. Model, Simulate, Act Community ContextPatterns, meaningful Knowledge anomalies Analyze InformationDependencies, Collect, Correlate relationships Big Data Data Data Data Data Logs, Events, Costs, Usage, Attacks, Breaches Source: Information Security Is Becoming a Big Data Analytics Problem Published: 23 March 2012 Gartner research by Neil MacDonald
    110. 110. Ap p l i c a t i o Bu s i n s Pr o c Fa c i n e s s De v i e s s e l i t i Un i t c e s s e s s I n f o r ma t i o Co n t n a c t s• Visibility • Accountabilit y• Collaboration • Criticality
    111. 111. 
    112. 112.  Study Released 16 May
    113. 113. Integrated Governance, Risk & Compliance (GRC) and QualityVision Management for Better Business Performance • Audit Management • Risk ManagementSolutions • Corporate and Supplier Governance • Regulatory and Operational Compliance • Quality Management • Kleiner Perkins Caufield & Byers (Google, Amazon, Cisco, Genentech)Backing • Integral Capital Partners • 600+ employees with profitable operations and strong-growthAnalyst • Gartner Magic Quadrant: ―Leader‖Ratings • Forrester Wave: ―Leader’ Forrester GRC Wave Q4 ‘11
    114. 114. Metrics Simulation/AnalyticsSolutions Regulatory Operational Internal Supplier IT GRC Quality Mgmt Compliance Risk Mgmt Audit Mgmt Governance 3rd-PartyGRC Platform AppStudio Products Applications Policy & Supplier/ Risk Compliance Audit Issue Document Vendor … Other … Mgmt Mgmt Mgmt Mgmt Mgmt Mgmt ComplianceOnline AppXchange Content Forms Workflows Data Standards/Templates Community Risks Processes Controls Assets Organizations Regulations GRC Feeds Alerts & Feeds Security Alerts Dashboards Infolets Offline Briefcase Documents
    115. 115. Area of Functions/Compliance Standards Processes Risks Controls Control Tests• FCPA • IT • Process 1 • Risk 1 • Control 1 • Control Test 1• UK Anti-Brib. • Function 1 • Process 2 • Risk 2 • Control 2 • Control Test 2• CIA … • Process 3 • Risk 3 • Control 3 • Control Test 3• PCI … … …• SOX … … … … … … … … … … … … Policies/ References Documents Risk Assessments Issues • Regulation 1 • Policy 1 • Action Plan • Risk-Based • Regulation 2 • Implement • Procedure 1 • Requirement-Based • Standard 1 • Business Unit-Based • Monitor • Standard 2 • Work Instruction 1 … … … … …

    ×