Controlling Leakage and Disclosure Risk in Semantic Big Data pipelines

author: Ernesto Damiani, Khalifa University
published: July 28, 2016,   recorded: June 2016,   views: 1715


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


In many Big Data environments, information is made available as huge data streams, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities. It has become common for data analysts to have a mandate for computing Big Data analytics without holding the rights to access the individual data points in the input, as they may contain sensitive information or personal data protected by privacy regulations. This talk discusses the idea that techniques used for semantic enrichment of Big Data (such as semantic lifting to harmonize metadata representation across data collection points and pre-joins at data ingestion time to avoid computing semantic joins on Big Data storage) can be seen as non-linear leakage and privacy risk boosters. Intuition suggests that semantic techniques applied to Big Data representation may have a double impact on security risks: (1) increase leakage risk by increasing the value for the attacker per unit of information leaked (2) increase intrusion risk, making injection attacks (i.e. attacks aimed at poisoning data for subverting the outcome of analytics) more effective per unit of poisoned information injected . However, no clear methodology is currently available for quantifying the impact of these boosters. This talk will discuss a (semi-)quantitative technique for computing Big Data leakage risk estimates, in order to meaningfully compare them with the quantifiable benefits of semantic enrichment. Also, it will discuss a model and a toolkit for protecting semantically enriched data streams based on the idea of dynamic filters, incrementally built based on the applicable Access Control policy and on the analytics to be performed.

See Also:

Download slides icon Download slides: eswc2016_damiani_data_pipelines_01.pdf (6.3┬áMB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: