Browser data eviction criteria hit: All data effectively lost

OS: OSX 10.15.2 (19C57)
Browser: Firefox 72.0.2 (64-bit)
Extension: 1.5.3

I had commented on this thread but it appears to be dead at this point: [SOLVED] "requested database object could not be found"

After hitting the 2GB limit in Firefox, I canā€™t view any of my categories, tags, or bookmarked pages and I constantly get the warning that data may be lost soon.

At this point I just want to get a list of urls and the categories/tags that there were attached to. Backing up to GDrive is useless because Google compartmentalizes app-stored data so it is not accessible to me. Backing up locally requires the external helper app but it does not work.

How can I just get a simple JSON or XML export of urls and tags?

Sorry I missed your response in the other thread.
So there are 2 things you can do now:

  1. Make a copy of the Sqlite and .files file just in case, because we may soon be able to recover those with other means. We are working on an api that would allow you to query the data more effectively from outside of the extension
  2. You can try and purge the images indeed, Iā€™ve tried it with my extension and it worked, but that was chrome and not with a potentially locked database. What you need to do is to go to ā€œabout:addonsā€ > ā€œsettingsā€ > ā€œdebug addonsā€ > ā€œInspectā€ on the Memex item > ā€œconsoleā€ and then type and confirm:
    await storageMan.backend.dexieInstance.table('pages').toCollection().modify({ screenshot: null })

This process can take a while, or will immediately fail because of the storage blockers. In that case we would need to wait a bit until we have the API out for another try. When it is successful the console will after a while spit out a number, which is the amount of pages it processed.

Sorry about all those troubles. You wonā€™t believe how hard it is to make Memex work offline first in the browser.
Itā€™s been very challenging, because browsers sometimes only obey to the standards partially (like youā€™ve experienced now) or not at all. Some low level database implementations even differ from operating system to operating system. So in essence the browser/operating system combination creates a whole spectrum of different setups we need to take into account. On top of that, unlike with any other development flow, we cannot roll out updates selectively to some users to test, but need to always roll them out to everyone, and we have to wait 3-5 days on every update we push, because the chrome storeā€™s review policy. Anyhow. I feel with you this is a super shitty situation for you and I hope we can resolve it soon.

For anyone else that is inclined, the shortcut for inspecting/debugging this extension in FF is:

about:devtools-toolbox?type=extension&id=info%40worldbrain.io

I ran the update and it appears to have cleared thumbnails. Also ran a count of pages and got 12,009 back. Still getting the missing object error and storage limit warnings but will try a browser restart later today.

Thanks @tautologistics for the updates and the shortcut to get to the console!

Hope it is solved with the restart. Keep us posted :pray:

Tried a bunch of thingsā€¦ browser restart, OS restart, and some addon debugging but nothing good. Still have a bloated set of data (2GB) and hourly warnings of data loss.

Finally got the local backup helper to work (sort of)ā€¦ now, I have a 102KB images file and 2.8MB content file but the addon indicated 468MB of exportable data. It appears to be an incremental backup based on some previous backup state/timestamp (which never was successful) and I have not found any way to reset the backup state or initiate a complete and full backup.

So even now I donā€™t have a complete JSON file that I can parse and transform to some other format.

Ok great thats a start!

To reset the backup, change the folder via the Extension UI and youā€™ll be prompted if you want to restart.
If for some reason this should not work, reset your initial backup time go back to the debug screen > Storage > local Storage > mozā€¦ & delete the entry LastBackup. Delete the backups you already made, reload the tab with the backup screen and restart the backup.

Let me know if all worked!

@BlackForestBoi

Had a little free time and wrote something to export the data I care about (see below). This dumps all pages (url, title, and tags) that are contained in a list. e.g.:

{
  "1571755589344": {
    "name": "Ideas",
    "id": 1571755589344,
    "pages": [
      {
        "url": "http://www.altx.com/manifestos/rozztox.html",
        "title": "The ROZZ-TOX Manifesto",
        "tags": ["philosophy", "80s", "art"]
      }
    ]
  }
}

Itā€™s unfortunate that it had to come to this because I think the concept is great and the search works well, however the idea that users own and control their data is quite a stretch at this point. Even though data is stored locally, if it canā€™t easily and reliably be extracted then the dataā€™s locality is meaningless.

Charging for features, having a smooth import process, and claiming ā€œMemex is offline first. You have full control over your data.ā€ all while direct access and export of data is at best a work in progress leaves a bad taste in oneā€™s mouth and gives the impression, whether intentional or not, of trying to retain users by locking their data in.

I sincerely wish you well and success with the hope that the companyā€™s stated goals and values eventually manifest itself in the product you are building.

// about:devtools-toolbox?type=extension&id=info%40worldbrain.io

await Promise.all([
  storageMan.backend.dexieInstance.customLists.toArray(),
  storageMan.backend.dexieInstance.pageListEntries.toArray(),
  storageMan.backend.dexieInstance.tags.toArray(),
]).then(async ([lists, listPages, tags]) => {
  console.log('Data fetched... building export object');

  let result = lists.reduce(
    (result, list) => {
      result[list.id] = { name: list.name, id: list.id, pages: [] };
      return result;
    },
    {}
  );

  let tagIdx = tags.reduce(
    (result, tagEntry) => {
      let tagList = result[tagEntry.url] = result[tagEntry.url] || [];
      tagList.push(tagEntry.name);
      return result;
    },
    {}
  );

  listPages.forEach(async list_url => {
    let url = list_url.pageUrl;
    let page = await storageMan.backend.dexieInstance.pages.where('url').equals(url).first();
    let list = result[list_url.listId];
    list.pages.push({ url: page.fullUrl, title: page.fullTitle, tags: tagIdx[url] || [] });
  });

  return result;
}).then(console.log).catch(console.error);
1 Like

@tautologistics

Wow awesome initiative Chris.

I know things are not as optimal as they could be in terms of data access. For us itā€™s really a resource issue at this point and we canā€™t build all those features while trying to get a sustainable business up and running that does not take venture capital. Itā€™s more a matter of priorities than lack of desire to make all of that smooth. Although it could be better definitely I donā€™t feel we are violating our values here. Afterall you own your data if its all on your computer, and the ability for you to write a script like that in a few hours speaks for that too. I get your issue though, its not user friendly or finished at all.

The good thing is that we are about to release StorexHub which is essentially an offline-first Zapier like API, also for Memex. You can already work with it actually. We are just polishing up the documentation so that people can get started on it.
We built it because we know there will be many use cases on how people want to export, import and analyse Memex data or use different technologies (like IPFs), and we wonā€™t ever be able to cover all. So we built this interface where developers can do it themselves without our central control.

The little script you built would make a great app in there to help people export their data.